Re: [I] When will the 4.8.0 version be released？ [lucenenet]

via GitHub Thu, 26 Oct 2023 02:10:26 -0700


NightOwl888 commented on issue #793:
URL: https://github.com/apache/lucenenet/issues/793#issuecomment-1780717782

First of all, the versioning scheme had been decided some time ago and is in
fact [documented](https://lucenenet.apache.org/contributing/versioning.html)
and made part of the build. At this point I don't see any reason to go back and
revisit this scheme which was part of the work that was done during the first
4.8.0 beta.

> By releasing yourself from this constraint you would have the flexibility
to release stable versions of the functionality that you have implemented
without waiting for 100% feature parity with a given upstream Java version.

> This way you may opt to never be 100% feature complete with Java Lucene
4.8 (for example), because the community is more in need for some 7.x features
that can then be prioritized over the long tail of rarely used 4.8 features
(just as a made up example). By following your own version scheme you can
instead document version X as "compatible with Lucene 4.8 minus features Y and
Z".

This assumes usability and API are the entire issue, but they are not.

Lucene.NET is the most difficult application I have ever had the pleasure of
debugging in my 25 years as a developer. When we go off the map like this, we
literally throw away our best debugging tool, which is to run the *same
version* of Lucene and Lucene.NET side by side to see where the execution paths
diverge. I don't have an answer for how we could debug if we combine different
versions of Lucene. Do you?

Furthermore, the binary structure of the index does change from one version
to the next, making them incompatible and making it literally impossible to
bring many Lucene 9.x features back to Lucene.NET 4.x. We had this issue with
back-porting the [analyzers-nori](https://github.com/apache/lucenenet/pull/645)
package.

We have 100% compatibility with creating an index in Lucene and opening it
in Lucene.NET with the same version and plan to keep it that way going forward
(and it worked once the other way around, but hasn't been tested in quite a
while). The index isn't the only binary format that is also kept in sync
between versions.

There are other problems with disjointed versioning between Lucene and
Lucene.NET. Case and point: Lucene.NET 3.0.3. There was no release of Lucene
3.0.3. Despite trying to sleuth an answer I have no idea what commit Lucene.NET
3.0.3 is a port of. I could *guess* that it is a port from 3.0.1 (which
actually was released), but I can't be 100% sure. I didn't even know what
commit in this repo corresponded to the 3.0.3 release until I found it on an
obscure blog (they released 3.0.3 RC2 by renaming it, but didn't make a tag
corresponding to the 3.0.3 release). Both of these issues are the primary
reason we have never done a maintenance release of Lucene.NET 3.0.3. While we
could incorporate the *actual* version number as part of the
`InformationalVersion` and make it disjointed, it would be very confusing for
users who see numbers that overlap Lucene releases that don't correspond to
them or their binary formats. Strict version compatibility avoids getting into
this situation again.

For usability, there are also issues. Existing Lucene blog posts may not be
useful if the API is different than the major version of Lucene the post is
about.

The bottom line is there is no maintenance plan for making a Frankenstein
version of Lucene that incorporates features from different versions. The best
way is to try to sync the entire project to a single Git commit. The story goes
way beyond keeping the API in sync. It also means keeping the execution paths,
binary formats, tests, and documentation in sync.

While we could simply abandon 4.8.0 and start working on the latest version
of Lucene now, we would be stuck in a situation where we have all of the same
work to finish we do now **plus** an estimated 1800 hours of upgrading work.
This upgrade estimate could be off if we run into any major gaps that mean more
JDK features we need to find or build replacements for. Right now, we are in a
situation where our remaining work still has an undefined scope because of gaps
that we may not know about. The plan is to try to close all of the gaps so when
we finally do start working on the upgrade we have a mostly well-defined scope
of work instead of a fuzzy "research this and figure out what we need to do
here" situation, where research is often most of the work (meaning to create an
issue about it, we need to do most of the work first to define the scope of the
issue).

Also, seems like a total waste do to that. Most of the work that is
remaining is on ICU4N. I have almost convinced myself that we may be able to
release ICU4N as stable earlier by not strictly following the [ICU versioning
scheme](https://unicode-org.github.io/icu/userguide/icu/design.html#other-icu-design-principles)
but instead allowing each major release to have breaking API changes until we
stabilize it (we are 13 versions behind so we have some wiggle room, but it
does mean we will have to do a full upgrade every time we make a breaking API
change). But we should probably still conditionally compile out the "draft"
APIs and other APIs that are considered unstable in the NuGet package or at
least make them invisible to the IDE. There are still other issues to deal
with, such as the fact that NuGet doesn't actually deploy resource files for
cultures it doesn't recognize. There are many decisions to make like that in
ICU4N where there are gaps between Java and .NET. Unfortunatel
y, nobody here seems willing to talk about the *actual* work that remains.
Most want to move on to the next version of Lucene and pretend that we don't
need to do this work for the upgrade, anyway.

We could alternatively move on to 4.8.0 release while keeping the
`Lucene.Net.ICU` and components that depend on it unstable, but unfortunately
that means either splitting up the `lucene-cli` component or releasing it as
stable with unstable dependencies. I would argue we need to focus 100% on the
remaining things that could break the API before we do such a thing (such as
automated query parser generation), which could still be time-consuming. It
also means we won't have a completely stable 4.8.0 release, the first fully
stable release might be something like 4.8.0.17. Or else we would need to setup
our build to make separate stable and unstable release packages to comply with
the Apache [release
procedure](https://lucenenet.apache.org/contributing/make-release.html). And we
still wouldn't technically be able to start working on upgrading until we have
a stable ICU4N, anyway. I don't see how this improves the situation, it only
adds more work to do to make it stable and makes the
versioning history more difficult to understand.

It really sucks for us to have to reject what would ordinarily be good ideas
from the community, but unfortunately, most of these ideas never take
everything into consideration when providing such advice, only the "normal
stuff" that most projects deal with.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] When will the 4.8.0 version be released？ [lucenenet]

Reply via email to