Re: Algorithm of retrieving docs

Harshvardhan Ojha Thu, 13 Feb 2014 04:12:16 -0800

Hi Mike/Mikhail,

Don't you guys
think org.apache.lucene.codecs.bloom.FuzzySet.java, contains(BytesRef
value) methods returns probablity of having a field, and it is a place
where we are using hashing ?


Are there any other place in source which when given with document id,
could determine by calculating its hash and say if document with this id is
present or not in a single lookup O(1) ?

Regards
Harshvardhan Ojha


On Thu, Feb 13, 2014 at 5:11 PM, Michael McCandless <
[email protected]> wrote:

> Lucene only assigns its int docID during indexing.
>
> Retrieving a previously stored document is a O(1), but that involves a
> disk seek which can be very costly when the page is not in the OS's IO
> cache.  Lucene does not do any caching itself (relies on the OS
> instead).
>
> Have a look at the current default stored fields codec format:
>
> lucene/core/src/java/org/apache/lucene/codec/lucene41/Lucene41StoredFieldsFormat
> for details.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Feb 12, 2014 at 11:27 PM, Harshvardhan Ojha
> <[email protected]> wrote:
> > Hi All,
> >
> > I have a question regarding retrieval of documents by lucene.
> > I know lucene uses many files on disk to keep documents, each comprising
> > fields in it, and uses many IR algorithms, and inverted index to match
> > documents.
> >
> > My question is :
> > 1. How lucene stores these documents inside file system and gets it so
> fast?
> > 2. Does lucene uses any Hashing algorithm to get docs in O(1) ? If not
> which
> > DS is         used by lucene ?
> > 3. Except id provided by us at the time of indexing, is there any other
> > unique identifier       which is assigned by lucene to its documents ?
> >
> > I will appreciate If someone can provide me with source file names to
> study
> > these algorithms in detail.
> >
> > Regards
> > Harshvardhan Ojha
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Algorithm of retrieving docs

Reply via email to