How we would do it: - update the index format to v7 (this in itself is fiddly but there are ways) - open the index in-place migrated: - get all the leaf indices and wrap each in a new subclass of FilterCodecReader - override getPointsReader() on that subclass to return a correctly implemented PointsReader, which can read the data from the stored fields - be careful about the order you return the points - you might want to spool the points to a database like Derby or H2 since if you have a lot of data there is a risk of running out of memory - copy that whole index to a new index using IndexWriter#addIndexes(CodecReader...)
Copying the docs works too if you have the original text stored still, but we didn’t, so we use this sort of technique for all Lucene migrations. TX On Thu, 6 Jun 2019 at 07:07, Riccardo Tasso <riccardo.ta...@gmail.com> wrote: > Ok, > I know this policy and you perfectly explained why it makes sense. > > Anyway my index is really big and contains mostly textual data which are > expensive to reindex (because of custom analysis). > > Considering that the IndexUpgrader will efficiently do the most of the work > I should investigate how to fill this gap, without reindexing from scratch. > > > The most efficient approach I can figure is: > * convert from 4 to 7 > * open an index reader and an index writer on the 7 index > * iterate every document > * read the numeric field (since it's already stored) > * add to each document the IntPoint field > * update the document on the index > > I guess the expensive task here is the update, since it will delete and > readd the document, but in this case I think I will save the analysis > costs. > > Do you think there's a better way of doing this reindex? > > Thanks > > > Il mer 5 giu 2019, 17:41 Erick Erickson <erickerick...@gmail.com> ha > scritto: > > > You cannot upgrade more than one major version, you must re-index from > > scratch. There’s a long discussion of why, but basically it’s summed up > by > > this quote from Robert Muir: > > > > “I think the key issue here is Lucene is an index not a database. Because > > it is a lossy index and does not retain all of the user's data, its not > > possible to safely migrate some things automagically. In the norms case > > IndexWriter needs to re-analyze the text ("re-index") and compute stats > to > > get back the value, so it can be re-encoded. The function is y = f(x) and > > if x is not available its not possible, so lucene can't do it.” > > > > This has always been true, before 8x it would just fail silently as you > > have found. Solr/Lucene starts up but don’t work quite as expected. As > of > > Lucene 8x, Lucene (and therefore Solr) will not even open an index that > > has _ever_ been touched by Lucene 6x, no matter what intervening steps > > have been taken. Or in general, Lucene/Solr X will not open indexes > > touched by X-2, starting with 8x rather than behave unexpectedly. > > > > Best, > > Erick > > > > > On Jun 5, 2019, at 8:27 AM, Riccardo Tasso <riccardo.ta...@gmail.com> > > wrote: > > > > > > Hello everybody, > > > I have a (very big) lucene 4 index with documents using IntField. On > that > > > field, which should be stored and sortable, I should search and execute > > > range queries. > > > > > > I've tried to upgrade it from 4 to 7 with IndexUpgrader but I observed > > that > > > IntFields aren't searchable anymore. > > > > > > Which is the most efficient way to convert IntFields to IntPoints, > which > > > are stored and sortable? > > > > > > Thanks, > > > Riccardo > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > >