On Sat, Oct 4, 2014 at 12:35 PM, Jack Krupansky <[email protected]> wrote:
> I tried to follow all of the trunk 6/branch 5x discussion, but... AFAICT
> there was no explicit decision or even implication that a release 5.0 would
> be imminent or that there would not be a 4.11 release. AFAICT, the whole
> trunk 6/branch 5x decision was more related to wanting to have a trunk that
> eliminated the 4x deprecations and was no longer constrained by
> compatibility with the 4x index – let me know if I am wrong about that in
> any way! But I did see a comment on one Jira referring to “preparation for a
> 5.0 release”, so I wanted to inquire about intentions. So, is a 5.0 release
> “coming soon”, or are 4.11, 4.12, 4.13... equally likely?

I created a branch_5x because 3.x index support was responsible for
multiple recent corruption bugs, some of which starting impacting 4.x
indexes.

Especially bad were:
LUCENE-5907: 3.x back compat code corrupts (not just can't read) your index.
LUCENE-5934: 3.x back compat code corrupts (not just can't read) your 4.0 index.
LUCENE-5975: 3.x back compat code reports a false corruption (was
indeed a bug in those versions of lucene) for 3.0-3.3 indexes.

Whenever I see patterns in corruptions then I see it as a systemic
problem and aggressively work to do something about it. I've seen
several lately, but these are the relevant ones:

3.x back compat: 3.x didn't have a codec API, so its wedged in, and
pretty hard. Its not that we were lazy, its that its radically
different: doesn't separate data by fields, sorts terms differently,
uses shared docstores, writes field numbers implicitly, ... We try to
emulate it the best we can for testing, but the emulation can't really
be perfect, so in such places: surprise, bugs. The only way to stop
these corruptions is to stop supporting it.

test infrastructure: IMO lucene 4 wasn't really ready to support
multiple index formats from a test perspective, so we cheated and try
to emulate old formats and rotate them across all tests. This works
ok, but its horrible to debug (since
these are essentially integration tests), the false failure rate is
extremely high, and the complexity of the implementation is high. Its
not just that it misses to find some bugs, it was actually directly
responsible for corruption bugs like LUCENE-5377. But throughout 4.x,
we have fixed the situation and added BaseXYZFormat tests for each
part of an index format. Now we have reliable unit tests for each part
of the abstract codec API: adding new tests here finds old bugs and
prevents new ones in the future. For example I fixed several minor
bugs in 4.x's CFS code just the last few days with this approach.

there are also other patterns like deleting files, commit fallback
logic, exception handling, addIndexes, etc that we have put
substantial work into recently for 5.0. Whatever was safe to backport
to bugfix releases, we tried, but some of these kinds of "fixes" are
just too heavy for a bugfix branch, and many just cannot even be done
as long as 3.x support exists. There is also some hardening in the 5.0
index format itself that really could not happen correctly as long as
we must support 3.x.

So its not just that 3.x causes corruption bugs, it prevents us from
moving forward and actually tackling these other issues. This is
important to do or we will just continue to "tread water" and not
actually get ahead of them. So I did something about it and created a
5.x branch. Worse case, nobody would follow along, but I guess I just
assumed the situation was widely understood.

>
> Open questions: What is Heliosearch up to, and what are Elasticsearch’s
> intentions?
>

I don't see how this is relevant. The straw the broke the camel's back
for me was LUCENE-5934, and it doesn't impact elasticsearch.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to