Caution: there're a lot of substantial changes in this commit. I've tested things pretty well, but there may well be more bugs. Please consider the CVS a little flakier than usual right now, until a few folks have tested these changes.

But please do give these changes a try! They should make a lot of phrase and conjunctive queries faster, especially with big indexes. Tell me if you have any problems.

Cheers,

Doug

[EMAIL PROTECTED] wrote:
  +
  + 1. Changed the format of the .tis file, so that:
  +
  +    - it has a format version number, which makes it easier to
  +      back-compatibly change file formats in the future.
  +
  +    - the term count is now stored as a long.  This was the one aspect
  +      of the Lucene's file formats which limited index size.
  +
  +    - a few internal index parameters are now stored in the index, so
  +      that they can (in theory) now be changed from index to index,
  +      although there is not yet an API to do so.
  +
  +    These changes are back compatible.  The new code can read old
  +    indexes.  But old code will not be able read new indexes. (cutting)
  +
  + 2. Added an optimized implementation of TermDocs.skipTo().  A skip
  +    table is now stored for each term in the .frq file.  This only
  +    adds a percent or two to overall index size, but can substantially
  +    speedup many searches.  (cutting)
  +
  + 3. Restructured the Scorer API and all Scorer implementations to take
  +    advantage of an optimized TermDocs.skipTo() implementation.  In
  +    particular, PhraseQuerys and conjunctive BooleanQuerys are
  +    faster when one clause has substantially fewer matches than the
  +    others.  (A conjunctive BooleanQuery is a BooleanQuery where all
  +    clauses are required.)  (cutting)
  +
  +


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to