NightOwl888 opened a new issue #460:
URL: https://github.com/apache/lucenenet/issues/460


   I don't know if this is an issue or a discussion yet, but it seems logical 
to document this somewhere in case we make it to release with gaps in support 
for NLP.
   
   First of all, Lucene 4.8.0 didn't support [Apache 
OpenNLP](https://opennlp.apache.org/), it supported [Apache 
UIMA](https://uima.apache.org/). So, we picked a newer Lucene version (8.2.0) 
and did what it did, choosing OpenNLP instead of UIMA (which is [seemingly now 
part of the OpenNLP 
package](https://github.com/apache/opennlp/tree/master/opennlp-uima)).
   
   ## Options for NLP Support in .NET
   
   <table>
   <tbody>
   <tr>
   <th>Option</th>
   <th>Issues</th>
   <th>Notes</th>
   <tr>
   </tr>
   <td>Port OpenNLP from <a 
href="https://github.com/apache/opennlp/releases/tag/opennlp-1.9.1-rc2";>version 
1.9.1 tag</a> to .NET</td>
   <td>
   <ul>
   <li>The project is large and would take a lot of effort to port and 
maintain.</li>
   </ul>
   </td>
   <td></td>
   </tr>
   </tr>
   <td>Use <a href="https://github.com/AlexPoint/OpenNlp";>AlexPoint/OpenNlp</a> 
from <a href="https://www.nuget.org/packages/OpenNLP/";>NuGet</a></td>
   <td>
   <ul>
   <li>API has been refactored significantly from OpenNLP and would take a 
high-level analysis to use the new API</li>
   <li>It isn't clear what version of OpenNLP this is as the version number 
doesn't seem to track the one in Java, but it is probably long before 1.9.1 and 
seems to be missing features Lucene uses</li>
   <li>Currently only supports .NET Framework 4.5+</li>
   </ul>
   </td>
   <td></td>
   </tr>
   </tr>
   <td>Use <a 
href="https://sergey-tihon.github.io/Stanford.NLP.NET/#/";>Standford 
NLP.NET</a></td>
   <td>
   <ul>
   <li>The API is significantly different from OpenNLP and it would take a 
high-level analysis to determine whether it has the features we need</li>
   <li>It is an IKVM port, which currently only supports .NET Framework 3.5</li>
   <li>Its GNU2 license is <a 
href="https://apache.org/legal/resolved.html#category-x";>too restrictive to use 
in an Apache project</a> (we can depend on, but not import code)</li>
   </ul>
   </td>
   <td>There is a project called <a 
href="http://www.cs.cmu.edu/~ark/TweetNLP/index.html";>Tweet NLP</a> that 
extends it and seems to supply much of the functionality Lucene uses</td>
   </tr>
   </tr>
   <td>Use <a 
href="https://github.com/IanMercer/AboditNaturalLanguage";>AboditNLP</a> from <a 
href="https://www.nuget.org/packages/AboditNLP/";>GitHub</a></td>
   <td>
   <ul>
   <li>A high-level analysis is required to determine if it supports the 
functionality Lucene uses.</li>
   <li>Closed-source, only demos and the NuGet package are available.</li>
   </ul>
   </td>
   <td>Targets .NET Framework 4.7.2, .NET Standard 2.0, and .NET Standard 
2.1.</td>
   </tr>
   </tr>
   <td>Use <a href="https://github.com/SciSharp/CherubNLP";>CherubNLP</a> from 
<a href="https://www.nuget.org/packages/CherubNLP/";>NuGet</a></td>
   <td>
   <ul>
   <li>Would require a high-level analysis to determine if it supports the 
functionality Lucene uses</li>
   </ul>
   </td>
   <td>Targets .NET Standard 2.0.</td>
   </tr>
   </tr>
   <td>Use <a 
href="https://github.com/sergey-tihon/OpenNLP.NET";>OpenNLP.NET</a> from <a 
href="https://www.nuget.org/packages/OpenNLP.NET/";>GitHub</a></td>
   <td>
   <ul>
   <li>It is an IKVM port, which currently only supports .NET Framework 3.5</li>
   </ul>
   </td>
   <td><b>This is the option we currently use. </b>Someone created a 
strong-named package named <a 
href="https://www.nuget.org/packages/OpenNLP.NET.Signed/";>OpenNLP.NET.Signed</a>.
 It would be preferable to get the original package owner to strong-name, but I 
suppose that would mean incrementing to at least version 1.9.1.1, or upgrading 
to a newer version of OpenNLP.</td>
   </tr>
   </tbody>
   </table>
   
   There are some other options, but the above list seem to be the most 
"official" ones. However, there are currently no options for .NET Core/.NET 5+ 
support of OpenNLP with the same API as OpenNLP 1.9.1.
   
   ## IKVM
   
   Unfortunately, while IKVM has been a reasonable go-to way to quickly support 
Java-based apps in the past, it has been [abandoned by its main 
contributor](http://weblog.ikvm.net/) in 2017 and has no .NET Core/NET Standard 
support.
   
   There is an effort to get it working on .NET Core named 
[ikvm-revived](https://github.com/ikvm-revived/ikvm) (to which I have 
contributed) but it seems to have been stalled for about a year and, as of the 
date of this writing, there isn't even a pre-release on NuGet. There is some 
debate whether they should support .NET Framework, but if they didn't we would 
still be able to target the current OpenNLP.NET version on .NET Framework.
   
   See [NuGet Repository?](https://github.com/ikvm-revived/ikvm/issues/8)
   
   ## Alternatives to IKVM
   
   There was an announcement on the Microsoft Blog about .NET 5 supporting 
interoperability with Java, but it isn't clear what they meant by that.
   
   
https://devblogs.microsoft.com/dotnet/announcing-net-5-0-preview-1/#comment-4932
   
   In fact, others are mentioning in the comments they cannot use NLP on .NET 
Core and are hoping to resolve that in .NET 5.
   
   I have searched, but cannot find any examples anywhere of how .NET 5 
supports Java interop, but if it does that would probably be a better path 
forward than IKVM for NLP support. However, it sounds as if this feature was 
punted from the official .NET 5 release.
   
   ## Current Support for NLP in Lucene.NET
   
   Since we are depending on the IKVM-based 
[OpenNLP.NET](https://github.com/sergey-tihon/OpenNLP.NET) project, our current 
support is limited to .NET Framework 4.5.1+.
   
   We do have some minor issues (namely lack of `InternalsVisibleTo` support) 
due to the fact that the library is not strong-named, but these are internal. 
Time will tell if lack of strong-naming is going to be an issue for end users, 
but ideally to get strong naming we should contribute to OpenNLP.NET rather 
than using the strong-named clone named 
[OpenNLP.NET.Signed](https://www.nuget.org/packages/OpenNLP.NET.Signed/).
   
   Most options for supporting NLP on .NET Core would require some work to put 
into play, and it isn't clear how much work is involved to analyze this at a 
high level. It also isn't clear how big the demand for this functionality will 
be.
   
   While we could make an effort to change dependencies, it would be sensible 
to create a new assembly named after the new dependency (in the `src/dotnet` 
folder) so it is clear what it depends on and leave the existing 
Lucene.Net.Analysis.OpenNLP project as-is.
   
   Another option is just to wait to see whether `ikvm-revieved` releases a 
.NET Core targeted package on NuGet and then support it when they finally do.
   
   > NOTE: If we bring back support for native .NET Collation in 
Lucene.Net.Analysis.Common, it is possible that its `SortKey`s would not be 
portable between .NET Framework and .NET Core/.NET 5+ (see [Caveats and 
Comparisons](https://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/collation/package-summary.html)).
 If we don't have .NET Core/.NET 5 support for Lucene.Net.OpenNLP, that 
collator option could cause some issues if indexing can only be done on .NET 
Framework, but searching is done on .NET Core or .NET 5. However, we have a 
[collator in 
Lucene.Net.ICU](https://lucenenet.apache.org/docs/4.8.0-beta00014/api/icu/Lucene.Net.Analysis.Icu.html#collation)
 that is stable across .NET target frameworks that could be used instead in 
that scenario.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to