NightOwl888 commented on issue #793: URL: https://github.com/apache/lucenenet/issues/793#issuecomment-1781314014
> If I read the entire thread correctly, there was never a suggestion to just abandon 4.8, but instead to decide the API is stable and focus on bug fixes, then release 4.8 and figure out a different way to version the library so that API changes can be done later. This way, going from beta to release would mean the current feature set is stable, but without the guarantees of implementing 100% of the APIs of the Java version. > > Just to give an example, speaking only from my experience with the library, I personally was not aware of the desire to keep on-disk binary formats the same between Java and .NET. We are only using a subset of all this functionality, and we would definitely not be using the Java version, let alone on the same data. We don't care about Java Lucene at all, we just want a really good .NET full text search engine (actually we don't care about on-disk format at all as we are 100% in memory, but that's a different story). > I respect the decision to do a line-by-line port of Java Lucene, but I do like to point out that porting the most relevant features would not necessarily lead to a "Frankenstein" version. Obviously any feature that goes into the codebase have to be well architected and any technical dependencies for this feature have to be implemented properly. But consider if the goal was just to make the best .NET full text search engine out there, maybe omitting the long tail of rarely used features to not have to spend 1800 hours on version 4.8, instead focusing on the most popular features (again, building on robust foundation) may be serving the community better. This could perhaps lead to a higher engagement from the community (in terms of collaboration/PRs and possibly funding). You could still use Java Lucene as a blueprint for the implementation, but with the additional insight in what turned out well and what did not turn out so well there, without being burdened like they have by kee ping compatibility also with less used and less well designed features. You are making some assumptions that just aren't true here. 1. You are assuming that we have the high-level knowledge of each component to make such a derivative version. 2. You are assuming that we would have some way to keep the feature set in line with Lucene if it were not a line-by-line port. 3. You are assuming that we know which features our users find most valuable. While it is clear that a component such as `Lucene.Net.Analysis.Nori` (for Korean) will have very limited scope, it isn't so clear for more generalized components such as `Lucene.Net.ICU` that are useful in a lot more scenarios that `Lucene.Net.Analysis.Common` simply doesn't cover. 4. You are assuming that we could get the tests to function the same way in .NET as they do in Java without a line-by-line port. Lucene has a custom test framework that uses repeatable randomized tests. This test framework is upgraded between versions of Lucene along with the tests. Without keeping the binary formats the same, we would have to recreate all of the corrupt indexes for the tests. Arguably, the index format is the one thing that the Lucene team gave the most thought to about making Lucene portable across programming languages. Granted, we could use the documented format and try to reinvent the wheel for the rest, but there are a lot of components that would have to be analyzed at a high level so they could be recreated. In addition, Lucene also has pluggable codecs so a newer version of Lucene can read the binary format from an older version so users can upgrade the software first and then upgrade the index later. Maybe you don't use this feature, but for users of apps with high availability, this feature is a must. There are over 3000 code files in Lucene and it is not documented well - it could easily take years of analysis before we even start writing anything. We wouldn't even have much of an idea which features are important and which are not without tons of analysis and research. And when we are finished, there would be no reasonable way to incorporate features of new versions of Lucene (which is what happened on the NUnit project). As for upgrading a single feature ahead of where it is in Lucene, this is where we run into problems. We have no idea before porting it what other patches it depends upon and whether any of those depend on binary formats that have changed. So we could start off porting to get the "future" feature in 4.8.0 only to find out later that it is incompatible and all of the work porting that one feature would go out the window. It would take much longer to port Lucene feature by feature than it would be to port the diff between 2 commits to get to a higher version. And we would always be sure to have a version that works (at least as well as it worked in Java). > > We could alternatively move on to 4.8.0 release while keeping the Lucene.Net.ICU and components that depend on it unstable > To be blunt, and in all respect, it might get hard to find funding for hundreds or thousands of dev hours fixing the ICU library to support rare scripts and languages, until someone with a clear business case for it turns up. Just for comparison, if some company needed, say, vector valued fields (just as a random example) they might have the resources to fund the maintainers directly or devote professional developers to work with you on implementing this feature. But since I understand it you want to go to 9.something directly after 4.8, maybe we'll see a lot more contributions coming in as the field will be more open for new features. That is true about funding. But the fact of the matter is that ICU4N has had more funding than Lucene.NET even though it is an alpha with unstable APIs and we still are working out how to properly package it. Maybe it is easier to get people to fund Lucene.NET if ICU4N is a done deal, but Lucene.NET moves on without ICU4N my fear is that ICU4N will never be released. It is a tough sell to "release" Lucene.NET 4.8.0 and then ask for funding to "finish" it (which is basically to subsidize ICU4N). And it doesn't seem right to sell people on the idea that we are collecting funding for the upgrade only to shift that funding to finish ICU4N. It is far easier to finish ICU4N first, then release it, then release Lucene.NET, then ask for Lucene.NET funding for the 1800 hours to upgrade it (which is a pretty well defined scope). You are right in that doing it in this order means there is less help on Lucene.NET, but that isn't really where the help is needed until the upgrade anyway. We have analyzed this pretty well and this is by far the fastest path (even though it is taking years because of limited funding and help). > > but unfortunately that means either splitting up the lucene-cli component or releasing it as stable with unstable dependencies > If you have policies against pre-release libraries this is probably also a no go. I think policies like this are based on the assumption that pre-release means unstable implementation, while you mean unstable API. This is probably the core of this discussion, as it is clear that the code base is very stable from a bugs point of view. For the most part, yes. There are a few intermittently failing tests we have yet to track down. We mostly just have several APIs that are likely to break before the release. Since `lucene-cli` contains the utilities to maintain the index, it doesn't seem right to make it a prerelease when the rest of the code is a release. But it is a command line app, so it isn't like anyone will depend on it directly. `Lucene.Net.ICU` is another matter, though. I suspect it is the big companies that will require it most and those companies are the ones that are also most likely to have policies against pre-release libraries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org