Re: Lucene 1.4 - lobby for final release

Erik Hatcher Fri, 26 Mar 2004 14:19:37 -0800

On Mar 26, 2004, at 3:32 PM, Stephane James Vaucher wrote:

I'm personally a fan of a release small but often approach, but what are the new features available in 1.4 (a list would be nice, on the wiki perhaps)? Will there be interim builds available to try these new features out soon?

There is a CHANGES.txt in the root of the jakarta-lucene CVS repository that stays pretty much current and accurate. I'm pasting it below for the 1.3 -> CVS HEAD changes.

There seem to be no nightly builds on:

http://cvs.apache.org/builds/jakarta-lucene/nightly/

I guess at this time you will have to build it yourself from CVS. There is one show-stopper before we can release an RC1. We must fully convert to ASL 2.0 (meaning every single source file needs the license header as well as any other files that can be tagged with it). I know Otis has changed some files, but we need a full sweep. There have been some utilities posted in a committers area to facilitate this change more automatically if we want to use them.

Erik

excerpt from CHANGES.txt

1.4 RC1

1. Changed the format of the .tis file, so that:

    - it has a format version number, which makes it easier to
      back-compatibly change file formats in the future.

    - the term count is now stored as a long.  This was the one aspect
      of the Lucene's file formats which limited index size.

    - a few internal index parameters are now stored in the index, so
      that they can (in theory) now be changed from index to index,
      although there is not yet an API to do so.

    These changes are back compatible.  The new code can read old
    indexes.  But old code will not be able read new indexes. (cutting)

 2. Added an optimized implementation of TermDocs.skipTo().  A skip
    table is now stored for each term in the .frq file.  This only
    adds a percent or two to overall index size, but can substantially
    speedup many searches.  (cutting)

 3. Restructured the Scorer API and all Scorer implementations to take
    advantage of an optimized TermDocs.skipTo() implementation.  In
    particular, PhraseQuerys and conjunctive BooleanQuerys are
    faster when one clause has substantially fewer matches than the
    others.  (A conjunctive BooleanQuery is a BooleanQuery where all
    clauses are required.)  (cutting)

 4. Added new class ParallelMultiSearcher.  Combined with
    RemoteSearchable this makes it easy to implement distributed
    search systems.  (Jean-Francois Halleux via cutting)

 5. Added support for hit sorting.  Results may now be sorted by any
    indexed field.  For details see the javadoc for
    Searcher#search(Query, Sort).  (Tim Jones via Cutting)

 6. Changed FSDirectory to auto-create a full directory tree that it
    needs by using mkdirs() instead of mkdir().  (Mladen Turk via Otis)

 7. Added a new span-based query API.  This implements, among other
    things, nested phrases.  See javadocs for details.  (Doug Cutting)

 8. Added new method Query.getSimilarity(Searcher), and changed
    scorers to use it.  This permits one to subclass a Query class so
    that it can specify it's own Similarity implementation, perhaps
    one that delegates through that of the Searcher.  (Julien Nioche
    via Cutting)

 9. Added MultiReader, an IndexReader that combines multiple other
    IndexReaders.  (Cutting)

10. Added support for term vectors.  See Field#isTermVectorStored().
    (Grant Ingersoll, Cutting & Dmitry)

11. Fixed the old bug with escaping of special characters in query
    strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
    (Jean-Francois Halleux via Otis)

12. Added support for overriding default values for the following,
    using system properties:
      - default commit lock timeout
      - default maxFieldLength
      - default maxMergeDocs
      - default mergeFactor
      - default minMergeDocs
      - default write lock timeout
    (Otis)

13. Changed QueryParser.jj to allow '-' and '+' within tokens:
    http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
    (Morus Walter via Otis)

14. Changed so that the compound index format is used by default.
    This makes indexing a bit slower, but vastly reduces the chances
    of file handle problems.  (Cutting)


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene 1.4 - lobby for final release

Reply via email to