How about indexing a field with your application-centric id? This is _the_ way this sort of thing is handled. You could then query for a specific id using a TermQuery.

    Erik



On Oct 11, 2005, at 11:58 AM, Shane O'Sullivan wrote:

Hi all,

As far as I understand today, Lucene assigns docIDs to documents according to the order in which the documents are added to the index. Hence, docIDs are assigned by the engine in a sequential manner, without gaps. This order of document identifiers then determines the order of the postings in the postings lists, i.e. all postings lists are sorted by docID. It also means that the same document appearing in two different indices would probably not have the same docID (unless some extreme care was taken to insert documents
in the same order).

There are situations where the application wants to determine the docID for
the index, i.e. to control the ordering of occurrences in the postings
lists. This is useful to ensure, for example, that a document has a stable
and consistent document identifier regardless of insertion order to an
index.

In either case, the application would want to pass into the index the
numeric identifier of the document. However, such identifiers may not be sequential, i.e. it's possible that there would be a document with docID M
without there being any document whose docID is M-1.

Q1. How difficult would it be to change Lucene to accept the docIDs from the
application, and not care about any possible gaps those ids may have?
One possible problem is that since the Doc Ids could become very large, and
are non-sequential, creating a single array for them all would not be
feasible.

Q2. Does Lucene's search code depend on the fact that document IDs are
sequential?

Thanks

Shane



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to