Thanks Michael for help, this helped me with my problem. Regards Harshvardhan Ojha
On Thu, Feb 13, 2014 at 8:51 PM, Michael McCandless < [email protected]> wrote: > The bloom filter is only used by the postings format wrapper, and > we've had mixed results on whether it helps performance or not (seems > to depend heavily on the exact usage). > > We have bit set / iterator abstractions (oal.util.Bits, > oal.search.DocIdSet/Iterator) to manage "sets" of documents, but most > implementations don't use a hash set under the hood. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Feb 13, 2014 at 7:11 AM, Harshvardhan Ojha > <[email protected]> wrote: > > Hi Mike/Mikhail, > > > > Don't you guys think org.apache.lucene.codecs.bloom.FuzzySet.java, > > contains(BytesRef value) methods returns probablity of having a field, > and > > it is a place where we are using hashing ? > > > > Are there any other place in source which when given with document id, > could > > determine by calculating its hash and say if document with this id is > > present or not in a single lookup O(1) ? > > > > Regards > > Harshvardhan Ojha > > > > > > On Thu, Feb 13, 2014 at 5:11 PM, Michael McCandless > > <[email protected]> wrote: > >> > >> Lucene only assigns its int docID during indexing. > >> > >> Retrieving a previously stored document is a O(1), but that involves a > >> disk seek which can be very costly when the page is not in the OS's IO > >> cache. Lucene does not do any caching itself (relies on the OS > >> instead). > >> > >> Have a look at the current default stored fields codec format: > >> > >> > lucene/core/src/java/org/apache/lucene/codec/lucene41/Lucene41StoredFieldsFormat > >> for details. > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> > >> On Wed, Feb 12, 2014 at 11:27 PM, Harshvardhan Ojha > >> <[email protected]> wrote: > >> > Hi All, > >> > > >> > I have a question regarding retrieval of documents by lucene. > >> > I know lucene uses many files on disk to keep documents, each > comprising > >> > fields in it, and uses many IR algorithms, and inverted index to match > >> > documents. > >> > > >> > My question is : > >> > 1. How lucene stores these documents inside file system and gets it so > >> > fast? > >> > 2. Does lucene uses any Hashing algorithm to get docs in O(1) ? If not > >> > which > >> > DS is used by lucene ? > >> > 3. Except id provided by us at the time of indexing, is there any > other > >> > unique identifier which is assigned by lucene to its documents ? > >> > > >> > I will appreciate If someone can provide me with source file names to > >> > study > >> > these algorithms in detail. > >> > > >> > Regards > >> > Harshvardhan Ojha > >> > > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
