Re: Adding fields with same field type complains that they have different term vector settings

Michael McCandless Mon, 29 Jun 2020 07:24:42 -0700

Hi Albert,

Unfortunately, you have fallen into a common and sneaky Lucene trap.


The problem happens because you loaded a Document from the index's stored
fields (the one you previously indexed) and then tried to modify that one
and re-index.

Lucene does not guarantee that this will work, because Lucene does not
store all information necessary to precisely reconstruct the original
document you had indexed.

The Document you loaded from the index is subtly different from the one you
had previously indexed.  In particular, your custom FIELD_TYPE details were
lost.

To sidestep this tar pit you must fully reconstruct the document yourself
each time you add it to the index.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jun 29, 2020 at 9:56 AM Albert MacSweeny <
albert.macswe...@profium.com> wrote:

> Hi,
>
> I'm upgrading a project to lucene 8.5.2 which had been using 3.0.0.
>
> Some tests are failing with a strange issue. The gist of it is, we create
> fields that need position and offset information. Inserting one field works
> ok, but then searching for the document and adding another value for the
> same field results in the following exception
>
> java.lang.IllegalArgumentException: all instances of a given field name
> must have the same term vectors settings (storeTermVectorPositions changed
> for field="f1")
>     at
> org.apache.lucene.index.TermVectorsConsumerPerField.start(TermVectorsConsumerPerField.java:166)
>     at
> org.apache.lucene.index.TermsHashPerField.start(TermsHashPerField.java:294)
>     at
> org.apache.lucene.index.FreqProxTermsWriterPerField.start(FreqProxTermsWriterPerField.java:72)
>     at
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:810)
>     at
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442)
>     at
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406)
>     at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
>     at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495)
>     at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
>     at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213)
>     at com.profium.sir.LuceneTest.writeDoc(LuceneTest.java:66)
>     at com.profium.sir.LuceneTest.testLucene(LuceneTest.java:58)
>
> This is happening even though the exact same FieldType object is being
> used in the field each time, and it is frozen.
>
> I've isolated the problem to the following code snippet which reproduces
> it:
>
>
>     import java.io.IOException;
>     import java.nio.file.Path;
>
>     import org.apache.lucene.analysis.en.EnglishAnalyzer;
>     import org.apache.lucene.document.Document;
>     import org.apache.lucene.document.Field;
>     import org.apache.lucene.document.FieldType;
>     import org.apache.lucene.index.DirectoryReader;
>     import org.apache.lucene.index.IndexOptions;
>     import org.apache.lucene.index.IndexWriter;
>     import org.apache.lucene.index.IndexWriterConfig;
>     import org.apache.lucene.search.IndexSearcher;
>     import org.apache.lucene.store.Directory;
>     import org.apache.lucene.store.MMapDirectory;
>
>     public class LuceneTest {
>
>         private static FieldType FIELD_TYPE = new FieldType();
>
>         static {
>             FIELD_TYPE.setStored(true);
>             FIELD_TYPE.setTokenized(true);
>
> FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
>             FIELD_TYPE.setStoreTermVectors(true);
>             FIELD_TYPE.setStoreTermVectorPayloads(true);
>             FIELD_TYPE.setStoreTermVectorPositions(true);
>             FIELD_TYPE.setStoreTermVectorOffsets(true);
>             FIELD_TYPE.freeze();
>         }
>
>         public static void main(String[] args) throws IOException {
>             testLucene();
>         }
>
>         public static void testLucene() throws IOException {
>             Document doc = new Document();
>             doc.add(new Field("f1", "foo", FIELD_TYPE));
>             writeDoc(doc);
>             IndexSearcher searcher = new
> IndexSearcher(DirectoryReader.open(getDirectory()));
>             doc = searcher.doc(0);
>
>             doc.add(new Field("f1", "bar", FIELD_TYPE));
>             writeDoc(doc);
>         }
>
>         private static void writeDoc(Document doc)
>                 throws IOException {
>             Directory directory = getDirectory();
>             IndexWriterConfig conf = new IndexWriterConfig(new
> EnglishAnalyzer());
>             IndexWriter writer = new IndexWriter(directory , conf);
>             writer.addDocument(doc);
>             writer.flush();
>             writer.close();
>         }
>
>         private static Directory getDirectory() throws IOException {
>             return new MMapDirectory(Path.of("lucenttest"));
>         }
>     }
>
> Experimenting shows that if the following three properties are not set on
> the FieldType, the exception is no longer thrown, but removing them breaks
> functionality we have that depends on the position and offset info.
>
>  FIELD_TYPE.setStoreTermVectorPayloads(true);
>  FIELD_TYPE.setStoreTermVectorPositions(true);
>  FIELD_TYPE.setStoreTermVectorOffsets(true);
>
> Perhaps I'm doing something I shouldn't be, thanks in advance for any help!
>
> Regards,
> Albert
>
>
>
> Albert MacSweeny
> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland
> Tel. +358 (0)9 855 98 000  Mob. +353 (0)87 664 2560
> Internet: http://www.profium.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Adding fields with same field type complains that they have different term vector settings

Reply via email to