NightOwl888 commented on PR #645:
URL: https://github.com/apache/lucenenet/pull/645#issuecomment-1263040758

   Thanks for the contribution.
   
   There was a prior attempt at porting the nori analyzer on [the 
feature/analysis-nori 
branch](https://github.com/NightOwl888/lucenenet/tree/feature/analysis-nori). 
However, there were 2 issues preventing it from functioning on Lucene.Net 4.8.0:
   
   1. The FST implementation was redesigned between 4.8.0 and the version that 
nori was added in Lucene. This incompatibility made it impossible to debug the 
tests.
   2. BigDecimal doesn't exist in .NET.
   
   We are more concerned with getting past the first issue than the second, 
since it would be trivial to exclude `KoreanNumberFilter` and 
`KoreanNumberFilterFactory` from the project (commented out, with notes 
explaining that we are missing the `BigDecimal` dependency at the top of the 
file), but I will provide some info about how to proceed if you are interested 
in porting it.
   
   ## Issues to Fix
   
   1. Move this into a new project named `Lucene.Net.Analysis.Nori`, using 
`Lucene.Net.Analysis.Kuromoji` as a project template. We don't want to add 
large dependencies to `Lucene.Net.Analysis.Common`, since most people are using 
it.
   2. Add a comment at the top of each code file indicating the exact Lucene 
version it is ported from. Example: `// Lucene version compatibility level 
9.3.0`
   3. Bring over the tests from [the feature/analysis-nori 
branch](https://github.com/NightOwl888/lucenenet/tree/feature/analysis-nori) 
and add them to a new project named `Lucene.Net.Tests.Analysis.Nori` (or 
alternatively create a fresh port of the tests). Either way, please ensure the 
tests are up to date with the exact Lucene version that you are porting the 
nori analyzer from.
   4. The nori project is very similar to 
[Lucene.Net.Analysis.Kuromoji](https://github.com/apache/lucenenet/tree/b5ea527c5bd125dd1db34d8b914e1a5d72e08ffa/src/Lucene.Net.Analysis.Kuromoji)
 and 
[Lucene.Net.Analysis.SmartCn](https://github.com/apache/lucenenet/tree/b5ea527c5bd125dd1db34d8b914e1a5d72e08ffa/src/Lucene.Net.Analysis.SmartCn)
 projects. Please use similar coding styles (constants instead of static 
fields, properties instead of methods where appropriate, naming conventions, 
folder detection for custom dictionaries, etc.)
   5. After addressing the porting issues below, please ensure all of the tests 
are passing. The best way to do this is to [setup Java 
debugging](https://lucenenet.apache.org/contributing/how-to-setup-java-lucene-debugging.html)
 and step through the code. Note that you will need to use the version of Java 
Lucene you are doing the port from in order to get the same results. Lucene has 
made some changes to their setup procedure, so it may take some 
research/experimentation to do. Do note that we have [these 
versions](https://github.com/NightOwl888/lucene/branches) successfully running. 
Alternatively, you may compare the code files line by line, but it may not be 
possible to figure out how to make the tests pass if you use this approach.
   
   ## FST
   
   At the time I attempted [the feature/analysis-nori 
branch](https://github.com/NightOwl888/lucenenet/tree/feature/analysis-nori), 
the FST API seemed to fit, however, due to some design changes it produced 
completely different results than the version I had ported it from 
(unfortunately, I don't recall what version it is based upon). At the time I 
thought that FST was tied deeply into other Lucene components and having 
multiple incompatible versions in the project wouldn't work. However, I have 
since learned that FST is only used in specific scenarios that end users won't 
need to plug together, so having a copy of the latest FST in the 
`Lucene.Net.Analysis.Nori` project should work fine (I think).
   
   To be able to debug, we need to be able to step through the code and get FST 
to return the exact results from the Lucene version this is based upon. So, we 
need a fresh port of FST from the same version of Lucene that nori is based 
upon. The convention we are following is to put "extra" components such as this 
into a folder named `Support`, followed by the regular folder convention in 
Lucene.
   
   ```
   src
     |
     -- Lucene.Net.Analysis.Nori
       |
       -- (All nori code files/folders)
       |
       -- Support
         |
         -- Util
           |
           -- Fst
             |
             -- (All FST code files)
     |
     -- Lucene.Net.Tests.Analysis.Nori
       |
       -- (All nori test code files/folders)
       |
       -- Support
         |
         -- Util
           |
           -- Fst
             |
             -- (All FST test code files)
   ```
   
   Please be mindful that we will be using similar namespace conventions as we 
are currently (the namespace may not necessarily match the name of the project 
it belongs to). For now, please put the new FST port into the 
`Lucene.Net.Support.Util.Fst` namespace.
   
   ## BigDecimal
   
   We definitely don't want to take on a dependency to IKVM, both because it is 
large and because it doesn't support most of the .NET runtimes that we do. 
Please try the following (in order of preference):
   
   1. I suspect the primary reason `BigDecimal` was the goto class in Java is 
because Java doesn't have a `decimal` data type like .NET does. It has limited 
precision compared to `BigDecimal`, but would work for most use cases. As in 
the [the feature/analysis-nori 
branch](https://github.com/NightOwl888/lucenenet/tree/feature/analysis-nori), 
we can use a nullable `decimal?` to account for gaps between Java and .NET 
reference vs value types.
   2. We could attempt to wrap `BigInteger` into a `BigDecimal`. There is an 
implementation 
[here](https://gist.github.com/JcBernack/0b4eef59ca97ee931a2f45542b9ff06d), 
which was found [on StackOverflow](https://stackoverflow.com/a/13813535). 
Please include it in the `Support/Numerics` folder (`Lucene.Net.Numerics` 
namespace) and be mindful of comments where people have suggested improvements 
to the implementation. We can use a nullable `BigDecimal?` to account for gaps 
between Java and .NET reference vs value types.
   3. If neither of the above options works, I suggest including the code files 
for `KoreanNumberFilter` and `KoreanNumberFilterFactory` (and their tests) in 
the project, but commenting them out. We will simply not include an 
implementation in Lucene.Net.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to