[ http://issues.apache.org/jira/browse/LUCENE-328?page=comments#action_12361490 ]
paul.elschot commented on LUCENE-328: ------------------------------------- > 1. Any particular reason for SortedVIntList not to implement DocNrSkipper > interface, the method getDocNrSkipper() is there, but declaration is missing. The object returned by the getDocNrSkipper() method implements the interface by adding a bit of state for the iteration over the document numbers. This allows more than one iterator on the (non modifiable) SortedVIntList. > 2. Should getDocNrSkipper() DocNrSkipper interface throw IOException? I have > tried to add TermDocsSortedIntList to the family, but all methods in TermDocs > are throwing IOException, and it is not nice to eat silently this exception > too early in DocNrSkipper. Better ideas to deal with that? I have no problem with the addition of throwing an IOException to the DocNrSkipper interface. The idea is to filter query results from RAM from which one would not normally expect an IOException , so one could also consider rethrowing the IOException wrapped in an Error. OTOH, the ability to obtain a DocNrSkipper directly from an index is certainly appealing, and then IOException is unavoidable. > 3. Paul, why SkipFilter exists (here I refer to the JIRA-330 )? Wouldn't be > better to use DocNrSkipper interface instead (SkipFilter does nothing but > wrapping this interface). Also, the same question applies to IterFilter. Did > I get something wrong here? The presence of class BitSet in the bits() method of Filter makes it impossible to provide another implementation of a Filter. SkipFilter/DocNrSkipper are intended to be parallel to Filter/BitSet, and the DocNrSkipper interface allows alternative implementations. Both SkipFilter and Filter are interfaces for factories/caches of for filter data structures. I'd like to somehow have these parallel paths merged, but I don't now how to do that. Perhaps SkipFilter could allow backward compatibility by also providing a bits() method, and use that method when it does not throw for example an UnsupportedOperationException. Another way would be to deprecate Filter in favour of SkipFilter, but that would have a lot of backward compatibility issues, and perhaps also some performance issues. The FilteredQuery of LUCENE-330 allows for both paths to be used, both paths are joined at line 211 in this FilteredQuery. The IterFilter of LUCENE-330 was replaced by SkipFilter, I forgot to indicate that when I downloaded the replacement. I have just deleted IterFilter there. > Must say, excelent work! Thanks. I should add that most of the hard work had already been done in org.apache.lucene.store.InputStream.readVInt() and org.apache.lucene.store.OutputStream.writeVInt(). Regards, Paul Elschot > Some utilities for a compact sparse filter > ------------------------------------------ > > Key: LUCENE-328 > URL: http://issues.apache.org/jira/browse/LUCENE-328 > Project: Lucene - Java > Type: Improvement > Components: Search > Versions: CVS Nightly - Specify date in submission > Environment: Operating System: other > Platform: Other > Reporter: paul.elschot > Assignee: Lucene Developers > Priority: Minor > Attachments: AndDocNrSkipper.java, AndDocNrSkipper.java, > BitSetSortedIntList.java, DocNrSkipper.java, DocNrSkipper.java, > IntArraySortedIntList.java, IntArraySortedIntList.java, OrDocNrSkipper.java, > OrDocNrSkipper.java, SortedVIntList.java, SortedVIntList.java, > SortedVIntList.java, TestDocNrSkippers.java, TestDocNrSkippers.java, > TestSortedVIntList.java, TestSortedVIntList.java, TestSortedVIntList.java, > intIterator.java > > Two files are attached that might form the basis for an alternative > filter implementation that is more memory efficient than one bit > per doc when less than about 1/8 of the docs pass through the filter. > > The document numbers are stored in RAM as VInt's from the Lucene index > format. These VInt's encode the difference between two successive > document numbers, much like a PositionDelta in the Positions: > http://jakarta.apache.org/lucene/docs/fileformats.html > > The getByteSize() method can be used to verify the compression > once a SortedVIntList is constructed. > The precise conditions under which this is more memory efficient than > one bit per document are not easy to specify in advance. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]