[
https://issues.apache.org/jira/browse/LUCENE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142373#comment-14142373
]
Michael McCandless commented on LUCENE-5952:
--------------------------------------------
bq. An alternative would be to put both static methods into CodecUtils, but
this would also not help with changes in format.
Or I can make the SIWriter do its own (private) thing. Yeah, that's an
"abstraction violation" (public Version ctor), and, yeah, future places that
need to write/read versions constants (e.g. LUCENE-5954) will have to dup this
code, but then the format is clearly owned by that writer/reader. Already we
are debating 4 vs 3 ints (format change...).
bq. Can we encode 3 ints instead of 4? As far as I know, the 'prerelease' was
added to support 4.0-alpha/4.0-beta. This was confusing (my fault), and this
confusion ultimately worked its way into an index corruption bug. I think we
should try to contain it to 4.0 instead and not keep things complicated like
that.
OK... but should we never expect to use prerelease anymore (e.g 5.0)?
bq. Can we consider just making a new 5.0 si writer? its a pain to bump the
codec version, but I'll do the work here. We can remove conditionals like
'supports checksums' as well.
+1
Separately we should make it easier to roll a new Codec version ... it's bad if
it's "daunting" since it pressures us to hide biggish changes under the
existing writers.
bq. We can followup with this by improving the exceptions for tiny "slurp-in"
classes like this (I would personally, as in do the work, also fix .fnm,
segments_N, .nvm, .dvm, .fdt, .tvx as well). I would add a
CodecUtil.addSuppressedChecksum or something, to easily allow these guys to
'annotate' any exc on init with checksum failure information. These are small
but important and it would help considering we are dodging challenges like JVM
bugs here.
Big +1: this would mean on any strange exc when reading these files, we would
also see if (in addition) their checksum did or did not match? This saves the
extra hassle of asking user to run CheckIndex to figure out if that file was
corrupt...
bq. I also want to bump 5.0 codec anyway, to fix the bug where
Lucene42TermVectorsFormat uses the same codecName as Lucene41StoredFieldsFormat
in the codec header, thats a stupid bug we should fix.
OK.
I think I'll break out the format change from this issue, and leave this as
just improving the Version error messages, having it not judge major version,
etc... I'll open a new issue for Lucene50Codec.
> Give Version parsing exceptions more descriptive error messages
> ---------------------------------------------------------------
>
> Key: LUCENE-5952
> URL: https://issues.apache.org/jira/browse/LUCENE-5952
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 4.10
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Blocker
> Fix For: 4.10.1, 5.0, Trunk
>
> Attachments: LUCENE-5952.patch, LUCENE-5952.patch, LUCENE-5952.patch,
> LUCENE-5952.patch, LUCENE-5952.patch, LUCENE-5952.patch
>
>
> As discussed on the dev list, it's spooky how Version.java tries to fully
> parse the incoming version string ... and then throw exceptions that lack
> details about what invalid value it received, which file contained the
> invalid value, etc.
> It also seems too low level to be checking versions (e.g. is not future proof
> for when 4.10 is passed a 5.x index by accident), and seems redundant with
> the codec headers we already have for checking versions?
> Should we just go back to lenient parsing?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]