Hi, thanks for the link and indeed https://issues.apache.org/jira/browse/LUCENE-7171 / https://github.com/apache/lucene/issues/8226 seems to be the issue here.
> Maybe try a simple `new TermQuery(new Term("id", "flags-1-1"))` query during update and see if it returns the correct ans? That was the thing - it didn't hence the update failing. However in the end I took a step back and instead of using string ID field (constructed from mailbox-id and message-id, which are also stored) used the query that's originally used to find the Document to update and switch from update call using term (`org.apache.lucene.index.IndexWriter#updateDocument()`) to one using query (`org.apache.lucene.index.IndexWriter#updateDocuments()`) so I'm re-using same query and... it works :) Another benefit is one less field stored in Document :) On 2024-08-11T02:20:38.000+02:00, Gautam Worah <worah.gau...@gmail.com> wrote: > I'm confused as to what could be happening. > > Google led me to this StackOverflow link: > > >https://stackoverflow.com/questions/36402235/lucene-stringfield-gets-tokenized-when-doc-is-retrieved-and-stored-again > > which references some longstanding old issues about fields changing their > > "types" and so on. > > The docs mention: `NOTE: only the content of a field is returned if that > > field was stored during indexing. Metadata like boost, omitNorm, > > IndexOptions, tokenized, etc., are not preserved.` > > Can you check what `doc.get(ID_FIELD)` returns, and if it looks right? > > Maybe try a simple `new TermQuery(new Term("id", "flags-1-1"))` query > > during update and see if it returns the correct ans? > > If the value is not right, perhaps you may have to use the original stored > > value: > > >https://lucene.apache.org/core/9_11_0/core/org/apache/lucene/search/IndexSearcher.html#storedFields() > > for crafting the `updateDocument()` call.. > > Best, > > Gautam Worah. > > On Sat, Aug 10, 2024 at 3:12 PM Wojtek <woj...@unir.se> wrote: > >> Hi, >> >> thank you for reply and apologies for being somewhat "all over >> the >> >> place". >> >> Regarding "tokenization" - should it happen if I use StringField? >> >> When the document is created (before writing) i see in the >> debugger >> >> it's not tokenized and is of type StringField: >> >> ``` >> >> doc = {Document@4830} >> >>>> "Document<stored,indexed,omitNorms,indexOptions=DOCS<id:flags-1-1>>" >> >> fields = {ArrayList@5920} size = 1 >> >> 0 = {StringField@5922} >> >>> "stored,indexed,omitNorms,indexOptions=DOCS<id:flags-1-1>" >> >> ``` >> >> But once in the update method (document being retrieved) I see it >> >> changes to StoredField and is already "tokenized": >> >> ``` >> >> doc = {Document@6526} >> >>> >>>"Document<stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:flags-1-1> >>> >>> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<mailboxid:1> >>> >>> stored,indexed,omitNorms,indexOptions=DOCS<flags:\FLAG> >>> >>> docValuesType=NUMERIC<uid:1> LongPoint <uid:1> stored<uid:1>>" >> >> fields = {ArrayList@6548} size = 6 >> >> 0 = {StoredField@6550} >> >>> "stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:flags-1-1>" >> >> 1 = {StoredField@6551} >> >>> "stored,indexed,tokenized,omitNorms,indexOptions=DOCS<mailboxid:1>" >> >> 2 = {StringField@6552} >> >>> "stored,indexed,omitNorms,indexOptions=DOCS<flags:\FLAG>" >> >> 3 = {NumericDocValuesField@6553} "docValuesType=NUMERIC<uid:1>" >> >> 4 = {LongPoint@6554} "LongPoint <uid:1>" >> >> 5 = {StoredField@6555} "stored<uid:1>" >> >> ``` >> >> The code that adds the documents - it's a method implemented in >> James: >> >> `org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex >> >>[http://org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex]#add` >> >> ( >> >> >>https://github.com/apache/james-project/blob/85ec4fbfe20637ce50b469ccaf394e6a8509ad6b/mailbox/lucene/src/main/java/org/apache/james/mailbox/lucene/search/LuceneMessageSearchIndex.java#L1240 >> >> ) that looks fairly straightforward: >> >> ``` >> >> public Mono<Void> add(MailboxSession session, Mailbox mailbox, >> >> MailboxMessage membership) { >> >> return Mono.fromRunnable(Throwing.runnable(() -> { >> >> Document doc = createMessageDocument(session, >> >> membership); >> >> Document flagsDoc = createFlagsDocument(membership); >> >> writer.addDocument(doc); >> >> writer.addDocument(flagsDoc); >> >> })); >> >> } >> >> ``` >> >> similarly to actual method that creates the flags >> >> ( >> >> >>https://github.com/apache/james-project/blob/85ec4fbfe20637ce50b469ccaf394e6a8509ad6b/mailbox/lucene/src/main/java/org/apache/james/mailbox/lucene/search/LuceneMessageSearchIndex.java#L1290 >> >> ): >> >> ``` >> >> private Document createFlagsDocument(MailboxMessage message) { >> >> Document doc = new Document(); >> >> doc.add(new StringField(ID_FIELD, "flags-" + >> >> message.getMailboxId().serialize() + "-" + >> >> Long.toString(message.getUid().asLong()), Store.YES)); >> >> doc.add(new StringField(MAILBOX_ID_FIELD, >> >> message.getMailboxId().serialize(), Store.YES)); >> >> doc.add(new NumericDocValuesField(UID_FIELD, >> >> message.getUid().asLong())); >> >> doc.add(new LongPoint(UID_FIELD, message.getUid().asLong())); >> >> doc.add(new StoredField(UID_FIELD, message.getUid().asLong())); >> >> indexFlags(doc, message.createFlags()); >> >> return doc; >> >> } >> >> ``` >> >> As you can see `StringField` is used when creating the document >> and to >> >> the best of my knowledge and based on what I was told - it >> _should_ >> >> not be tokenized (?). >> >> Update (in which the document can't be updated because Term seems >> to >> >> be not finding it) is done in >> >> `org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex >> >>[http://org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex]#update()` >> >> ( >> >> >>https://github.com/apache/james-project/blob/85ec4fbfe20637ce50b469ccaf394e6a8509ad6b/mailbox/lucene/src/main/java/org/apache/james/mailbox/lucene/search/LuceneMessageSearchIndex.java#L1259 >> >> ): >> >> ``` >> >> private void update(MailboxId mailboxId, MessageUid uid, Flags f) >> >> throws IOException { >> >> try (IndexReader reader = DirectoryReader.open(writer) >> [http://DirectoryReader.open(writer)]) { >> >> IndexSearcher searcher = new IndexSearcher(reader); >> >> BooleanQuery.Builder queryBuilder = new >> >> BooleanQuery.Builder(); >> >> queryBuilder.add(new TermQuery(new >> >> Term(MAILBOX_ID_FIELD, mailboxId.serialize())), >> >> BooleanClause.Occur.MUST); >> >> queryBuilder.add(createQuery(MessageRange.one(uid) >> [http://MessageRange.one(uid)]), >> >> BooleanClause.Occur.MUST); >> >> queryBuilder.add(new PrefixQuery(new Term(FLAGS_FIELD, >> >> "")), BooleanClause.Occur.MUST); >> >> TopDocs docs = searcher.search(queryBuilder.build >> [http://searcher.search(queryBuilder.build](), >> >> 100000); >> >> ScoreDoc[] sDocs = docs.scoreDocs; >> >> for (ScoreDoc sDoc : sDocs) { >> >> Document doc = searcher.doc(sDoc.doc); >> >> doc.removeFields(FLAGS_FIELD); >> >> indexFlags(doc, f); >> >> // somehow the document getting from the search >> >> lost DocValues data for the uid field, we need to re-define the >> field >> >> with proper DocValues. >> >> long uidValue = >> >> doc.getField("uid").numericValue().longValue(); >> >> doc.removeField("uid"); >> >> doc.add(new NumericDocValuesField(UID_FIELD, >> >> uidValue)); >> >> doc.add(new LongPoint(UID_FIELD, uidValue)); >> >> doc.add(new StoredField(UID_FIELD, uidValue)); >> >> writer.updateDocument(new Term(ID_FIELD, >> >> doc.get(ID_FIELD)), doc); >> >> } >> >> } >> >> } >> >> ``` >> >> I was wondering if Lucene/writer configuration is not a culprit >> (that >> >> would result in tokenizing even StringField) but it looks fairly >> >> straightforward: >> >> ``` >> >> this.directory [http://this.directory] = directory; >> >> this.writer = new IndexWriter(this.directory >> [http://this.directory], >> >> createConfig(createAnalyzer(lenient), dropIndexOnStart)); >> >> ``` >> >> where createConfig looks like this: >> >> ``` >> >> protected IndexWriterConfig createConfig(Analyzer analyzer, >> boolean >> >> dropIndexOnStart) { >> >> IndexWriterConfig config = new IndexWriterConfig(analyzer); >> >> if (dropIndexOnStart) { >> >> config.setOpenMode(OpenMode.CREATE); >> >> } else { >> >> config.setOpenMode(OpenMode.CREATE_OR_APPEND); >> >> } >> >> return config; >> >> } >> >> ``` >> >> and createAnalyzer like this: >> >> ``` >> >> protected Analyzer createAnalyzer(boolean lenient) { >> >> if (lenient) { >> >> return new LenientImapSearchAnalyzer(); >> >> } else { >> >> return new StrictImapSearchAnalyzer(); >> >> } >> >> } >> >> ``` >> >> On 2024-08-10T21:04:15.000+02:00, Gautam Worah >> >> <worah.gau...@gmail.com> wrote: >> >>> Hey, >>> >>> I don't think I understand the email well but I'll try my best. > > I'm confused as to what could be happening. > > Google led me to this StackOverflow link: > > >https://stackoverflow.com/questions/36402235/lucene-stringfield-gets-tokenized-when-doc-is-retrieved-and-stored-again > > which references some longstanding old issues about fields changing > their > > "types" and so on. > > The docs mention: `NOTE: only the content of a field is returned if > that > > field was stored during indexing. Metadata like boost, omitNorm, > > IndexOptions, tokenized, etc., are not preserved.` > > Can you check what `doc.get(ID_FIELD)` returns, and if it looks > right? > > Maybe try a simple `new TermQuery(new Term("id", "flags-1-1"))` > query > > during update and see if it returns the correct ans? > > If the value is not right, perhaps you may have to use the original > stored > > value: > > >https://lucene.apache.org/core/9_11_0/core/org/apache/lucene/search/IndexSearcher.html#storedFields() > > for crafting the `updateDocument()` call.. > > Best, > > Gautam Worah. > > On Sat, Aug 10, 2024 at 3:12 PM Wojtek <woj...@unir.se> wrote: > >> Hi, >> >> thank you for reply and apologies for being somewhat "all over >> the >> >> place". >> >> Regarding "tokenization" - should it happen if I use StringField? >> >> When the document is created (before writing) i see in the >> debugger >> >> it's not tokenized and is of type StringField: >> >> ``` >> >> doc = {Document@4830} >> >>>> "Document<stored,indexed,omitNorms,indexOptions=DOCS<id:flags-1-1>>" >> >> fields = {ArrayList@5920} size = 1 >> >> 0 = {StringField@5922} >> >>> "stored,indexed,omitNorms,indexOptions=DOCS<id:flags-1-1>" >> >> ``` >> >> But once in the update method (document being retrieved) I see it >> >> changes to StoredField and is already "tokenized": >> >> ``` >> >> doc = {Document@6526} >> >>> >>>"Document<stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:flags-1-1> >>> >>> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<mailboxid:1> >>> >>> stored,indexed,omitNorms,indexOptions=DOCS<flags:\FLAG> >>> >>> docValuesType=NUMERIC<uid:1> LongPoint <uid:1> stored<uid:1>>" >> >> fields = {ArrayList@6548} size = 6 >> >> 0 = {StoredField@6550} >> >>> "stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:flags-1-1>" >> >> 1 = {StoredField@6551} >> >>> "stored,indexed,tokenized,omitNorms,indexOptions=DOCS<mailboxid:1>" >> >> 2 = {StringField@6552} >> >>> "stored,indexed,omitNorms,indexOptions=DOCS<flags:\FLAG>" >> >> 3 = {NumericDocValuesField@6553} "docValuesType=NUMERIC<uid:1>" >> >> 4 = {LongPoint@6554} "LongPoint <uid:1>" >> >> 5 = {StoredField@6555} "stored<uid:1>" >> >> ``` >> >> The code that adds the documents - it's a method implemented in >> James: >> >> `org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex >> >>[http://org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex]#add` >> >> ( >> >> >>https://github.com/apache/james-project/blob/85ec4fbfe20637ce50b469ccaf394e6a8509ad6b/mailbox/lucene/src/main/java/org/apache/james/mailbox/lucene/search/LuceneMessageSearchIndex.java#L1240 >> >> ) that looks fairly straightforward: >> >> ``` >> >> public Mono<Void> add(MailboxSession session, Mailbox mailbox, >> >> MailboxMessage membership) { >> >> return Mono.fromRunnable(Throwing.runnable(() -> { >> >> Document doc = createMessageDocument(session, >> >> membership); >> >> Document flagsDoc = createFlagsDocument(membership); >> >> writer.addDocument(doc); >> >> writer.addDocument(flagsDoc); >> >> })); >> >> } >> >> ``` >> >> similarly to actual method that creates the flags >> >> ( >> >> >>https://github.com/apache/james-project/blob/85ec4fbfe20637ce50b469ccaf394e6a8509ad6b/mailbox/lucene/src/main/java/org/apache/james/mailbox/lucene/search/LuceneMessageSearchIndex.java#L1290 >> >> ): >> >> ``` >> >> private Document createFlagsDocument(MailboxMessage message) { >> >> Document doc = new Document(); >> >> doc.add(new StringField(ID_FIELD, "flags-" + >> >> message.getMailboxId().serialize() + "-" + >> >> Long.toString(message.getUid().asLong()), Store.YES)); >> >> doc.add(new StringField(MAILBOX_ID_FIELD, >> >> message.getMailboxId().serialize(), Store.YES)); >> >> doc.add(new NumericDocValuesField(UID_FIELD, >> >> message.getUid().asLong())); >> >> doc.add(new LongPoint(UID_FIELD, message.getUid().asLong())); >> >> doc.add(new StoredField(UID_FIELD, message.getUid().asLong())); >> >> indexFlags(doc, message.createFlags()); >> >> return doc; >> >> } >> >> ``` >> >> As you can see `StringField` is used when creating the document >> and to >> >> the best of my knowledge and based on what I was told - it >> _should_ >> >> not be tokenized (?). >> >> Update (in which the document can't be updated because Term seems >> to >> >> be not finding it) is done in >> >> `org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex >> >>[http://org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex]#update()` >> >> ( >> >> >>https://github.com/apache/james-project/blob/85ec4fbfe20637ce50b469ccaf394e6a8509ad6b/mailbox/lucene/src/main/java/org/apache/james/mailbox/lucene/search/LuceneMessageSearchIndex.java#L1259 >> >> ): >> >> ``` >> >> private void update(MailboxId mailboxId, MessageUid uid, Flags f) >> >> throws IOException { >> >> try (IndexReader reader = DirectoryReader.open(writer) >> [http://DirectoryReader.open(writer)]) { >> >> IndexSearcher searcher = new IndexSearcher(reader); >> >> BooleanQuery.Builder queryBuilder = new >> >> BooleanQuery.Builder(); >> >> queryBuilder.add(new TermQuery(new >> >> Term(MAILBOX_ID_FIELD, mailboxId.serialize())), >> >> BooleanClause.Occur.MUST); >> >> queryBuilder.add(createQuery(MessageRange.one(uid) >> [http://MessageRange.one(uid)]), >> >> BooleanClause.Occur.MUST); >> >> queryBuilder.add(new PrefixQuery(new Term(FLAGS_FIELD, >> >> "")), BooleanClause.Occur.MUST); >> >> TopDocs docs = searcher.search(queryBuilder.build >> [http://searcher.search(queryBuilder.build](), >> >> 100000); >> >> ScoreDoc[] sDocs = docs.scoreDocs; >> >> for (ScoreDoc sDoc : sDocs) { >> >> Document doc = searcher.doc(sDoc.doc); >> >> doc.removeFields(FLAGS_FIELD); >> >> indexFlags(doc, f); >> >> // somehow the document getting from the search >> >> lost DocValues data for the uid field, we need to re-define the >> field >> >> with proper DocValues. >> >> long uidValue = >> >> doc.getField("uid").numericValue().longValue(); >> >> doc.removeField("uid"); >> >> doc.add(new NumericDocValuesField(UID_FIELD, >> >> uidValue)); >> >> doc.add(new LongPoint(UID_FIELD, uidValue)); >> >> doc.add(new StoredField(UID_FIELD, uidValue)); >> >> writer.updateDocument(new Term(ID_FIELD, >> >> doc.get(ID_FIELD)), doc); >> >> } >> >> } >> >> } >> >> ``` >> >> I was wondering if Lucene/writer configuration is not a culprit >> (that >> >> would result in tokenizing even StringField) but it looks fairly >> >> straightforward: >> >> ``` >> >> this.directory [http://this.directory] = directory; >> >> this.writer = new IndexWriter(this.directory >> [http://this.directory], >> >> createConfig(createAnalyzer(lenient), dropIndexOnStart)); >> >> ``` >> >> where createConfig looks like this: >> >> ``` >> >> protected IndexWriterConfig createConfig(Analyzer analyzer, >> boolean >> >> dropIndexOnStart) { >> >> IndexWriterConfig config = new IndexWriterConfig(analyzer); >> >> if (dropIndexOnStart) { >> >> config.setOpenMode(OpenMode.CREATE); >> >> } else { >> >> config.setOpenMode(OpenMode.CREATE_OR_APPEND); >> >> } >> >> return config; >> >> } >> >> ``` >> >> and createAnalyzer like this: >> >> ``` >> >> protected Analyzer createAnalyzer(boolean lenient) { >> >> if (lenient) { >> >> return new LenientImapSearchAnalyzer(); >> >> } else { >> >> return new StrictImapSearchAnalyzer(); >> >> } >> >> } >> >> ``` >> >> On 2024-08-10T21:04:15.000+02:00, Gautam Worah >> >> <worah.gau...@gmail.com> wrote: >> >>> Hey, >>> >>> I don't think I understand the email well but I'll try my best. >>> >>> I'm confused as to what could be happening. > > Google led me to this StackOverflow link: > > >https://stackoverflow.com/questions/36402235/lucene-stringfield-gets-tokenized-when-doc-is-retrieved-and-stored-again > > which references some longstanding old issues about fields changing > their > > "types" and so on. > > The docs mention: `NOTE: only the content of a field is returned if > that > > field was stored during indexing. Metadata like boost, omitNorm, > > IndexOptions, tokenized, etc., are not preserved.` > > Can you check what `doc.get(ID_FIELD)` returns, and if it looks > right? > > Maybe try a simple `new TermQuery(new Term("id", "flags-1-1"))` > query > > during update and see if it returns the correct ans? > > If the value is not right, perhaps you may have to use the original > stored > > value: > > >https://lucene.apache.org/core/9_11_0/core/org/apache/lucene/search/IndexSearcher.html#storedFields() > > for crafting the `updateDocument()` call.. > > Best, > > Gautam Worah. > > On Sat, Aug 10, 2024 at 3:12 PM Wojtek <woj...@unir.se> wrote: > >> Hi, >> >> thank you for reply and apologies for being somewhat "all over >> the >> >> place". >> >> Regarding "tokenization" - should it happen if I use StringField? >> >> When the document is created (before writing) i see in the >> debugger >> >> it's not tokenized and is of type StringField: >> >> ``` >> >> doc = {Document@4830} >> >>>> "Document<stored,indexed,omitNorms,indexOptions=DOCS<id:flags-1-1>>" >> >> fields = {ArrayList@5920} size = 1 >> >> 0 = {StringField@5922} >> >>> "stored,indexed,omitNorms,indexOptions=DOCS<id:flags-1-1>" >> >> ``` >> >> But once in the update method (document being retrieved) I see it >> >> changes to StoredField and is already "tokenized": >> >> ``` >> >> doc = {Document@6526} >> >>> >>>"Document<stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:flags-1-1> >>> >>> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<mailboxid:1> >>> >>> stored,indexed,omitNorms,indexOptions=DOCS<flags:\FLAG> >>> >>> docValuesType=NUMERIC<uid:1> LongPoint <uid:1> stored<uid:1>>" >> >> fields = {ArrayList@6548} size = 6 >> >> 0 = {StoredField@6550} >> >>> "stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:flags-1-1>" >> >> 1 = {StoredField@6551} >> >>> "stored,indexed,tokenized,omitNorms,indexOptions=DOCS<mailboxid:1>" >> >> 2 = {StringField@6552} >> >>> "stored,indexed,omitNorms,indexOptions=DOCS<flags:\FLAG>" >> >> 3 = {NumericDocValuesField@6553} "docValuesType=NUMERIC<uid:1>" >> >> 4 = {LongPoint@6554} "LongPoint <uid:1>" >> >> 5 = {StoredField@6555} "stored<uid:1>" >> >> ``` >> >> The code that adds the documents - it's a method implemented in >> James: >> >> `org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex >> >>[http://org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex]#add` >> >> ( >> >> >>https://github.com/apache/james-project/blob/85ec4fbfe20637ce50b469ccaf394e6a8509ad6b/mailbox/lucene/src/main/java/org/apache/james/mailbox/lucene/search/LuceneMessageSearchIndex.java#L1240 >> >> ) that looks fairly straightforward: >> >> ``` >> >> public Mono<Void> add(MailboxSession session, Mailbox mailbox, >> >> MailboxMessage membership) { >> >> return Mono.fromRunnable(Throwing.runnable(() -> { >> >> Document doc = createMessageDocument(session, >> >> membership); >> >> Document flagsDoc = createFlagsDocument(membership); >> >> writer.addDocument(doc); >> >> writer.addDocument(flagsDoc); >> >> })); >> >> } >> >> ``` >> >> similarly to actual method that creates the flags >> >> ( >> >> >>https://github.com/apache/james-project/blob/85ec4fbfe20637ce50b469ccaf394e6a8509ad6b/mailbox/lucene/src/main/java/org/apache/james/mailbox/lucene/search/LuceneMessageSearchIndex.java#L1290 >> >> ): >> >> ``` >> >> private Document createFlagsDocument(MailboxMessage message) { >> >> Document doc = new Document(); >> >> doc.add(new StringField(ID_FIELD, "flags-" + >> >> message.getMailboxId().serialize() + "-" + >> >> Long.toString(message.getUid().asLong()), Store.YES)); >> >> doc.add(new StringField(MAILBOX_ID_FIELD, >> >> message.getMailboxId().serialize(), Store.YES)); >> >> doc.add(new NumericDocValuesField(UID_FIELD, >> >> message.getUid().asLong())); >> >> doc.add(new LongPoint(UID_FIELD, message.getUid().asLong())); >> >> doc.add(new StoredField(UID_FIELD, message.getUid().asLong())); >> >> indexFlags(doc, message.createFlags()); >> >> return doc; >> >> } >> >> ``` >> >> As you can see `StringField` is used when creating the document >> and to >> >> the best of my knowledge and based on what I was told - it >> _should_ >> >> not be tokenized (?). >> >> Update (in which the document can't be updated because Term seems >> to >> >> be not finding it) is done in >> >> `org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex >> >>[http://org.apache.james.mailbox.lucene.search.LuceneMessageSearchIndex]#update()` >> >> ( >> >> >>https://github.com/apache/james-project/blob/85ec4fbfe20637ce50b469ccaf394e6a8509ad6b/mailbox/lucene/src/main/java/org/apache/james/mailbox/lucene/search/LuceneMessageSearchIndex.java#L1259 >> >> ): >> >> ``` >> >> private void update(MailboxId mailboxId, MessageUid uid, Flags f) >> >> throws IOException { >> >> try (IndexReader reader = DirectoryReader.open(writer) >> [http://DirectoryReader.open(writer)]) { >> >> IndexSearcher searcher = new IndexSearcher(reader); >> >> BooleanQuery.Builder queryBuilder = new >> >> BooleanQuery.Builder(); >> >> queryBuilder.add(new TermQuery(new >> >> Term(MAILBOX_ID_FIELD, mailboxId.serialize())), >> >> BooleanClause.Occur.MUST); >> >> queryBuilder.add(createQuery(MessageRange.one(uid) >> [http://MessageRange.one(uid)]), >> >> BooleanClause.Occur.MUST); >> >> queryBuilder.add(new PrefixQuery(new Term(FLAGS_FIELD, >> >> "")), BooleanClause.Occur.MUST); >> >> TopDocs docs = searcher.search(queryBuilder.build >> [http://searcher.search(queryBuilder.build](), >> >> 100000); >> >> ScoreDoc[] sDocs = docs.scoreDocs; >> >> for (ScoreDoc sDoc : sDocs) { >> >> Document doc = searcher.doc(sDoc.doc); >> >> doc.removeFields(FLAGS_FIELD); >> >> indexFlags(doc, f); >> >> // somehow the document getting from the search >> >> lost DocValues data for the uid field, we need to re-define the >> field >> >> with proper DocValues. >> >> long uidValue = >> >> doc.getField("uid").numericValue().longValue(); >> >> doc.removeField("uid"); >> >> doc.add(new NumericDocValuesField(UID_FIELD, >> >> uidValue)); >> >> doc.add(new LongPoint(UID_FIELD, uidValue)); >> >> doc.add(new StoredField(UID_FIELD, uidValue)); >> >> writer.updateDocument(new Term(ID_FIELD, >> >> doc.get(ID_FIELD)), doc); >> >> } >> >> } >> >> } >> >> ``` >> >> I was wondering if Lucene/writer configuration is not a culprit >> (that >> >> would result in tokenizing even StringField) but it looks fairly >> >> straightforward: >> >> ``` >> >> this.directory [http://this.directory] = directory; >> >> this.writer = new IndexWriter(this.directory >> [http://this.directory], >> >> createConfig(createAnalyzer(lenient), dropIndexOnStart)); >> >> ``` >> >> where createConfig looks like this: >> >> ``` >> >> protected IndexWriterConfig createConfig(Analyzer analyzer, >> boolean >> >> dropIndexOnStart) { >> >> IndexWriterConfig config = new IndexWriterConfig(analyzer); >> >> if (dropIndexOnStart) { >> >> config.setOpenMode(OpenMode.CREATE); >> >> } else { >> >> config.setOpenMode(OpenMode.CREATE_OR_APPEND); >> >> } >> >> return config; >> >> } >> >> ``` >> >> and createAnalyzer like this: >> >> ``` >> >> protected Analyzer createAnalyzer(boolean lenient) { >> >> if (lenient) { >> >> return new LenientImapSearchAnalyzer(); >> >> } else { >> >> return new StrictImapSearchAnalyzer(); >> >> } >> >> } >> >> ``` >> >> On 2024-08-10T21:04:15.000+02:00, Gautam Worah >> >> <worah.gau...@gmail.com> wrote: >> >>> Hey, >>> >>> I don't think I understand the email well but I'll try my best. > > &g> >>>>>> > >>>>>>>>> Hi all! >>>>>>>>> >>>>>>>>> There is an effort in Apache James to update to a >>>>>>>>> more >>>>>>>>> >>>>>>>>> modern >>>>>>>>> >>>>>>>>> version of >>>>>>>>> >>>>>>>>> Lucene (ref: >>>>>>>>> >>>>>>>>> https://github.com/apache/james-project/pull/2342). >>>>>>>>> I'm >>>>>>>>> >>>>>>>>> digging >>>>>>>>> >>>>>>>>> into the >>>>>>>>> >>>>>>>>> issue as other have done >>>>>>>>> >>>>>>>>> but I'm stumped - it seems that >>>>>>>>> >>>>>>>>> `org.apache.lucene.index.IndexWriter#updateDocument` >>>>>>>>> >>>>>>>>> doesn't >>>>>>>>> >>>>>>>>> update >>>>>>>>> >>>>>>>>> the document. >>>>>>>>> >>>>>>>>> Documentation >>>>>>>>> >>>>>>>>> ( >> >> >>https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/index/IndexWriter.html#updateDocument(org.apache.lucene.index.Term,java.lang.Iterable) >> >>>>>>> ) >>>>>>> >>>>>>>>> states: >>>>>>>>> >>>>>>>>> Updates a document by first deleting the >>>>>>>>> document(s) >>>>>>>>> >>>>>>>>> containing >>>>>>>>> >>>>>>>>> term >>>>>>>>> >>>>>>>>> and then adding the new >>>>>>>>> >>>>>>>>> document. The delete and then add are atomic as >>>>>>>>> seen by >>>>>>>>> >>>>>>>>> a >>>>>>>>> >>>>>>>>> reader >>>>>>>>> >>>>>>>>> on the >>>>>>>>> >>>>>>>>> same index (flush may happen >>>>>>>>> >>>>>>>>> only after the add). >>>>>>>>> >>>>>>>>> Here is a simple test with it: >> >> >>https://github.com/woj-tek/lucene-update-test/blob/master/src/test/java/se/unir/AppTest.java >> >>>>>>>>> but it fails. >>>>>>>>> >>>>>>>>> Any guidance would be appreciated because I (and >>>>>>>>> >>>>>>>>> others) >>>>>>>>> >>>>>>>>> have >>>>>>>>> >>>>>>>>> been hitting >>>>>>>>> >>>>>>>>> wall with it :) >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Wojtek >> >>>>>>>>> >>>>>>>>>--------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> To unsubscribe, e-mail: >>>>>>>>> >>>>>>>>> java-user-unsubscr...@lucene.apache.org >>>>>>>>> >>>>>>>>> For additional commands, e-mail: >>>>>>>>> >>>>>>>>> java-user-h...@lucene.apache.org