Thank you Gautam!

This works. Now I went back to Lucene and I'm hitting the wall.

In James they set document with "id" being constructed as
"flag-<uid>-<uid>" (e.g. "<id:flags-1-1>").

I run the code that updates the documents with flags and afterwards
check the result. The code simple code I use new reader from the
writer (so it should be OK and should have new state):

```

try (IndexReader reader =
DirectoryReader.open(luceneMessageSearchIndex.writer)) {

    System.out.println("maxDoc: " + reader.maxDoc());

    IndexSearcher searcher = new IndexSearcher(reader);

    System.out.println("maxDoc (after second flag): " +
reader.maxDoc());

    // starting from "1" to avoid main mail document

    for (int i = 1; i < reader.maxDoc(); i++) {

        System.out.println(reader.storedFields().document(i));

    }

    var idQuery = new TermQuery(new Term("id", "flags-1-1"));

    var search = searcher.search(idQuery, 10000);

    System.out.println("Term search: " + search.scoreDocs.length +
" items: " + Arrays.toString(search.scoreDocs));

}

```

and the output is following:

```

try (IndexReader reader =
DirectoryReader.open(luceneMessageSearchIndex.writer)) {

    System.out.println("maxDoc: " + reader.maxDoc());

    IndexSearcher searcher = new IndexSearcher(reader);

    System.out.println("maxDoc (after second flag): " +
reader.maxDoc());

    // starting from "1" to avoid main mail document

    for (int i = 1; i < reader.maxDoc(); i++) {

        System.out.println(reader.storedFields().document(i));

    }

    var idQuery = new TermQuery(new Term("id", "flags-1-1"));

    var search = searcher.search(idQuery, 10000);

    System.out.println("Term search: " + search.scoreDocs.length +
" items: " + Arrays.toString(search.scoreDocs));

}

```

So even though I search for term with "flags-1-1" it yields 0 results
(but there are 2 documents with such ID already).

The gist of the issue is that for some reasons when trying to update
flags document instead of updating it (deleting/adding) it's only
being added. My reasoning is that for some reason there is an issue
with the term matching to the field so the update "fails" (it adds new
document for same term) when updating the document:

https://github.com/apache/james-project/pull/2342/files#diff-a7c2a3c5cdb7e4a2914c899409991e27df6b25ad54488f197bc533193e3a03d0R1267

The code looks ok, while debuging the term yields: "id: flags-1-1"
so  it looks OK (but it's only visual string comparison . I thought
that it could be the same issue with tokenizer but everywhere in the
code StringField is used for the id of the flags:

```

    private Document createFlagsDocument(MailboxMessage message) {

        Document doc = new Document();

        doc.add(new StringField(ID_FIELD, "flags-" +
message.getMailboxId().serialize() + "-" +
Long.toString(message.getUid().asLong()), Store.YES));

…

```

So the update based on

```

new Term(ID_FIELD, doc.get(ID_FIELD))

```

Should hit that exact document - correct?

Any pointers on how to debug that and see how/where the comparison is
done so I could maybe try to figure out why it doesn't match the
documents which causes the update to fail will be greatly appreciated!
(I've been at it for a couple of days now and while I learned a great
deal about Lucene, starting from absolutely zero knowledge, I think
I'm in over my head and stepping into Lucene with debugger doesn't
help much as I don't know exactly what/where to look for :) )

w.

On 2024-08-10T10:21:21.000+02:00, Gautam Worah
<worah.gau...@gmail.com> wrote:

> Hey,
> 
> Use a StringField instead of a TextField for the title and your test will
> 
> pass.
> 
> Tokenization which is enabled for TextFields, is breaking your fancy title
> 
> into tokens split by spaces, which is causing your docs to not match.
> 
> 
>https://lucene.apache.org/core/9_11_0/core/org/apache/lucene/document/StringField.html
> 
> Best,
> 
> Gautam Worah.
> 
> On Sat, Aug 10, 2024 at 12:05 AM Wojtek <woj...@unir.se> wrote:
> 
>>  Hi Froh,
>>  
>>   thank you for the information.
>>  
>>   I updated the code and re-open the reader - it seems that the
>>  update
>>  
>>   is reflected and search for old document doesn't yield anything
>>  but
>>  
>>   the search for new term fails.
>>  
>>   I output all documents (there are 2) and the second one has new
>>  title
>>  
>>   but when searching for it no document is found even though it's
>>  the
>>  
>>   same string that has been used to update the title.
>>  
>>   On 2024-08-10T01:21:39.000+02:00, Michael Froh <msf...@gmail.com>
>>  
>>   wrote:
>>  
>>>   Hi Wojtek,
>>>   
>>>    Thank you for linking to your test code!
>>>   
>>>    When you open an IndexReader, it is locked to the view of the
>>>   Lucene
>>>   
>>>    directory at the time that it's opened.
>>>   
>>>    If you make changes, you'll need to open a new IndexReader
>>>   before those
>>  
>>>   changes are visible. I see that you tried creating a new
>>>   IndexSearcher, but
>>>   
>>>    unfortunately that's not sufficient.
>>>   
>>>    Hope that helps!
>>>   
>>>    Froh
>>>   
>>>    On Fri, Aug 9, 2024 at 3:25 PM Wojtek <woj...@unir.se> wrote:
>>>   
>>>>    Hi all!
>>>>    
>>>>     There is an effort in Apache James to update to a more modern
>>>>    
>>>>     version of
>>>>    
>>>>     Lucene (ref:
>>>>    
>>>>     https://github.com/apache/james-project/pull/2342). I'm
>>>>    digging
>>>>    
>>>>     into the
>>>>    
>>>>     issue as other have done
>>>>    
>>>>     but I'm stumped - it seems that
>>>>    
>>>>     `org.apache.lucene.index.IndexWriter#updateDocument` doesn't
>>>>    
>>>>     update
>>>>    
>>>>     the document.
>>>>    
>>>>     Documentation
>>>>    
>>>>     (
>>  
>>   
>>https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/index/IndexWriter.html#updateDocument(org.apache.lucene.index.Term,java.lang.Iterable)
>>  
>>   )
>>  
>>>>    states:
>>>>    
>>>>     Updates a document by first deleting the document(s)
>>>>    containing
>>>>    
>>>>     term
>>>>    
>>>>     and then adding the new
>>>>    
>>>>     document. The delete and then add are atomic as seen by a
>>>>    reader
>>>>    
>>>>     on the
>>>>    
>>>>     same index (flush may happen
>>>>    
>>>>     only after the add).
>>>>    
>>>>     Here is a simple test with it:
>>  
>>   
>>https://github.com/woj-tek/lucene-update-test/blob/master/src/test/java/se/unir/AppTest.java
>>  
>>>>    but it fails.
>>>>    
>>>>     Any guidance would be appreciated because I (and others) have
>>>>    
>>>>     been hitting
>>>>    
>>>>     wall with it :)
>>>>    
>>>>     --
>>>>    
>>>>     Wojtek
>>>>    
>>>>     ---------------------------------------------------------------------
>>>>    
>>>>     To unsubscribe, e-mail:
>>>>    java-user-unsubscr...@lucene.apache.org
>>>>    
>>>>     For additional commands, e-mail:
>>>>    java-user-h...@lucene.apache.org

Reply via email to