NightOwl888 commented on issue #793:
URL: https://github.com/apache/lucenenet/issues/793#issuecomment-1781314014

   > If I read the entire thread correctly, there was never a suggestion to 
just abandon 4.8, but instead to decide the API is stable and focus on bug 
fixes, then release 4.8 and figure out a different way to version the library 
so that API changes can be done later. This way, going from beta to release 
would mean the current feature set is stable, but without the guarantees of 
implementing 100% of the APIs of the Java version.
   >
   > Just to give an example, speaking only from my experience with the 
library, I personally was not aware of the desire to keep on-disk binary 
formats the same between Java and .NET. We are only using a subset of all this 
functionality, and we would definitely not be using the Java version, let alone 
on the same data. We don't care about Java Lucene at all, we just want a really 
good .NET full text search engine (actually we don't care about on-disk format 
at all as we are 100% in memory, but that's a different story).
   
   > I respect the decision to do a line-by-line port of Java Lucene, but I do 
like to point out that porting the most relevant features would not necessarily 
lead to a "Frankenstein" version. Obviously any feature that goes into the 
codebase have to be well architected and any technical dependencies for this 
feature have to be implemented properly. But consider if the goal was just to 
make the best .NET full text search engine out there, maybe omitting the long 
tail of rarely used features to not have to spend 1800 hours on version 4.8, 
instead focusing on the most popular features (again, building on robust 
foundation) may be serving the community better. This could perhaps lead to a 
higher engagement from the community (in terms of collaboration/PRs and 
possibly funding). You could still use Java Lucene as a blueprint for the 
implementation, but with the additional insight in what turned out well and 
what did not turn out so well there, without being burdened like they have by 
kee
 ping compatibility also with less used and less well designed features.
   
   You are making some assumptions that just aren't true here.
   
   1. You are assuming that we have the high-level knowledge of each component 
to make such a derivative version.
   2. You are assuming that we would have some way to keep the feature set in 
line with Lucene if it were not a line-by-line port.
   3. You are assuming that we know which features our users find most 
valuable. While it is clear that a component such as `Lucene.Net.Analysis.Nori` 
(for Korean) will have very limited scope, it isn't so clear for more 
generalized components such as `Lucene.Net.ICU` that are useful in a lot more 
scenarios that `Lucene.Net.Analysis.Common` simply doesn't cover.
   4. You are assuming that we could get the tests to function the same way in 
.NET as they do in Java without a line-by-line port. Lucene has a custom test 
framework that uses repeatable randomized tests. This test framework is 
upgraded between versions of Lucene along with the tests.
   
   Without keeping the binary formats the same, we would have to recreate all 
of the corrupt indexes for the tests. Arguably, the index format is the one 
thing that the Lucene team gave the most thought to about making Lucene 
portable across programming languages. Granted, we could use the documented 
format and try to reinvent the wheel for the rest, but there are a lot of 
components that would have to be analyzed at a high level so they could be 
recreated.
   
   In addition, Lucene also has pluggable codecs so a newer version of Lucene 
can read the binary format from an older version so users can upgrade the 
software first and then upgrade the index later. Maybe you don't use this 
feature, but for users of apps with high availability, this feature is a must.
   
   There are over 3000 code files in Lucene and it is not documented well - it 
could easily take years of analysis before we even start writing anything. We 
wouldn't even have much of an idea which features are important and which are 
not without tons of analysis and research. And when we are finished, there 
would be no reasonable way to incorporate features of new versions of Lucene 
(which is what happened on the NUnit project).
   
   As for upgrading a single feature ahead of where it is in Lucene, this is 
where we run into problems. We have no idea before porting it what other 
patches it depends upon and whether any of those depend on binary formats that 
have changed. So we could start off porting to get the "future" feature in 
4.8.0 only to find out later that it is incompatible and all of the work 
porting that one feature would go out the window. It would take much longer to 
port Lucene feature by feature than it would be to port the diff between 2 
commits to get to a higher version. And we would always be sure to have a 
version that works (at least as well as it worked in Java).
   
   > >    We could alternatively move on to 4.8.0 release while keeping the 
Lucene.Net.ICU and components that depend on it unstable
   
   > To be blunt, and in all respect, it might get hard to find funding for 
hundreds or thousands of dev hours fixing the ICU library to support rare 
scripts and languages, until someone with a clear business case for it turns 
up. Just for comparison, if some company needed, say, vector valued fields 
(just as a random example) they might have the resources to fund the 
maintainers directly or devote professional developers to work with you on 
implementing this feature. But since I understand it you want to go to 
9.something directly after 4.8, maybe we'll see a lot more contributions coming 
in as the field will be more open for new features.
   
   That is true about funding. But the fact of the matter is that ICU4N has had 
more funding than Lucene.NET even though it is an alpha with unstable APIs and 
we still are working out how to properly package it. Maybe it is easier to get 
people to fund Lucene.NET if ICU4N is a done deal, but Lucene.NET moves on 
without ICU4N my fear is that ICU4N will never be released.
   
   It is a tough sell to "release" Lucene.NET 4.8.0 and then ask for funding to 
"finish" it (which is basically to subsidize ICU4N). And it doesn't seem right 
to sell people on the idea that we are collecting funding for the upgrade only 
to shift that funding to finish ICU4N. It is far easier to finish ICU4N first, 
then release it, then release Lucene.NET, then ask for Lucene.NET funding for 
the 1800 hours to upgrade it (which is a pretty well defined scope).
   
   You are right in that doing it in this order means there is less help on 
Lucene.NET, but that isn't really where the help is needed until the upgrade 
anyway. We have analyzed this pretty well and this is by far the fastest path 
(even though it is taking years because of limited funding and help).
   
   > > but unfortunately that means either splitting up the lucene-cli 
component or releasing it as stable with unstable dependencies
   
   > If you have policies against pre-release libraries this is probably also a 
no go. I think policies like this are based on the assumption that pre-release 
means unstable implementation, while you mean unstable API. This is probably 
the core of this discussion, as it is clear that the code base is very stable 
from a bugs point of view.
   
   For the most part, yes. There are a few intermittently failing tests we have 
yet to track down. We mostly just have several APIs that are likely to break 
before the release. 
   
   Since `lucene-cli` contains the utilities to maintain the index, it doesn't 
seem right to make it a prerelease when the rest of the code is a release. But 
it is a command line app, so it isn't like anyone will depend on it directly. 
`Lucene.Net.ICU` is another matter, though. I suspect it is the big companies 
that will require it most and those companies are the ones that are also most 
likely to have policies against pre-release libraries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to