Hi,

there is a way to make this work (which is the "official way" to do it): Your 
application software is already on Lucene 3.6, so why not simply use the 
IndexUpgrader class, which is shipped with Lucene 3.6? This class will upgrade 
the existing indexes (back to version 1.0) of your users to the latest 3.6 file 
format. After that, the index is readable with Lucene 4.x (but will not be with 
Lucene 5 aka trunk). If your users then move to Lucene 4, they can read the 
indexes. Ideally, you would also upgrade the indexes using IndexUpgrader to 4.x 
format when opening them for first time with the latest version.

IndexUpgrader is in fact just a merge policy that overrides 
IndexWriter#forceMerge(1) to always merge all segments with older format, 
although the index may be already only one segment. So you can also instantiate 
an IndexWriter as usual and assign the UpgradeMergePolicy to IndexWriterConfig. 
This merge policy is just a wrapper around another one like the default 
TieredMergePolicy.

About the technical problem: It would not be enough to pass a custom codec to 
the 4.x IndexReader, because those old Indexes do not really support all the 
semantics Lucene 3 and 4 offer because lots of stuff already in IndexReader and 
SegmentCoreReaders cannot handle such old indexes. If you would make it work, 
the main issue of, for example, the 3.x codec is the order of terms, which 
changed from UTF-16 to UTF-8 order.

If you really want to read older indexes, the following might work:
- Clone the Lucene 3.x codec to a private package name and change it to support 
older indexes (will be very hard).
- Add META-INF metadata to make Lucene 4 load *your* custom Lucene 3.x codec 
instead of the shipped one from the classpath. The codec must have name 
"Lucene3x" (although its also for older indexes).
But I am stll not sre, if this works completely, because IndexReader and 
IndexWriter may throw IndexTooOldException before the codec actually can trip 
in!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Trejkaz [mailto:trej...@trypticon.org]
> Sent: Monday, June 09, 2014 2:54 PM
> To: Lucene Users Mailing List
> Subject: Re: Reading a v2 index in v4
> 
> On Mon, Jun 9, 2014 at 10:17 PM, Adrien Grand <jpou...@gmail.com>
> wrote:
> > Hi,
> >
> > It is not possible to read 2.x indices from Lucene 4, even with a
> > custom codec. For instance, Lucene 4 needs to hook into
> > SegmentInfos.read to detect old 3.x indices and force the use of the
> > Lucene3x codec since these indices don't expose what codec has been
> > used to write them.
> 
> Rats. I was wondering how the Lucene3x codec worked, but now I know. I
> was hoping codecs were going to be more flexible than that, but it looks like
> nobody considered the possibility that I might want to pass a Codec into my
> IndexReader. :(
> 
> TX
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to