NightOwl888 commented on issue #793:
URL: https://github.com/apache/lucenenet/issues/793#issuecomment-2291774218

   > > Furthermore, the binary structure of the index does change from one 
version to the next, making them incompatible and making it literally 
impossible to bring many Lucene 9.x features back to Lucene.NET 4.x. We had 
this issue with back-porting the 
[analyzers-nori](https://github.com/apache/lucenenet/pull/645) package.
   > > We have 100% compatibility with creating an index in Lucene and opening 
it in Lucene.NET with the same version and plan to keep it that way going 
forward (and it worked once the other way around, but hasn't been tested in 
quite a while). The index isn't the only binary format that is also kept in 
sync between versions.
   > 
   > @NightOwl888 I am a Lucene Java programmer myself and am happy to help in 
any efforts to maintain two-way compatibility between Lucene and Lucene.NET.
   
   @superkelvint  - Sorry for the late reply. I didn't see your comment back in 
April.
   
   Thanks for offering to help with compatibility. One way you might be able to 
help us is to add support (even if it is unofficial) to the latest version of 
Lucene to read 4.8.0 codecs. The [backwards-codecs 
package](https://github.com/apache/lucene/tree/releases/lucene-solr/8.8.1/lucene/backward-codecs/src/java/org/apache/lucene/codecs)
 only goes back to Lucene 5.x.
   
   Our plan is that once Lucene.NET 4.8.0 is stable to jump ahead to the 
current version, so it would be beneficial if Lucene.NET users could upgrade 
the software first and upgrade their index at some later point. It would save 
us some time if we didn't have to grab the 4.x codecs from the last version 
that supported them and try to splice them into the backwards-codecs package, 
as offhand I don't really know what is involved.
   
   Another way you could help us (since you linked to the issue) is to provide 
some guidance on the 
[analysis-nori](https://github.com/apache/lucenenet/pull/645) module ([latest 
work 
here](https://github.com/NightOwl888/lucenenet/tree/feature/analysis-nori-2)). 
We got most of it working, but there are 3 test failures that were difficult to 
find an answer for. The tests are 
[`TestRandomHugeStringsMockGraphAfter`](https://github.com/NightOwl888/lucenenet/blob/feature/analysis-nori-2/src/Lucene.Net.Tests.Analysis.Nori/TestKoreanTokenizer.cs#L447),
 
[TestUserDict](https://github.com/NightOwl888/lucenenet/blob/feature/analysis-nori-2/src/Lucene.Net.Tests.Analysis.Nori/TestKoreanTokenizerFactory.cs#L95),
 and 
[TestLookup](https://github.com/NightOwl888/lucenenet/blob/feature/analysis-nori-2/src/Lucene.Net.Tests.Analysis.Nori/Dict/UserDictionaryTest.cs#L29).
 The biggest issue is that it is ported from Lucene 8.2.0 and the FST 
implementation has completely changed. I tried recreating the UserD
 ictionary with our ported code, but the UserDict test still doesn't pass. I 
also tried porting over the earliest version, but FST had changed before then.
   
   Now, since the kuromoji module is almost identical and it runs on 4.8.0, I 
suspect there is a solution. I have already [asked the Lucene 
team](https://lists.apache.org/thread/b5nt4hwbkxo5s75z32kp1ocg87q2qoq8), but 
their advice was just to wait until we upgrade. However, if we have someone who 
is willing to help us find a solution, maybe we can make this available sooner.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to