[
https://issues.apache.org/jira/browse/LUCENE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141633#comment-14141633
]
Robert Muir commented on LUCENE-5952:
-------------------------------------
thanks for beefing this up. the .si file is really centric to the segment, so
any safety we can add is good.
A few questions:
* Can we encode 3 ints instead of 4? As far as I know, the 'prerelease' was
added to support 4.0-alpha/4.0-beta. This was confusing (my fault), and this
confusion ultimately worked its way into an index corruption bug. I think we
should try to contain it to 4.0 instead and not keep things complicated like
that.
* Can we consider just making a new 5.0 si writer? its a pain to bump the codec
version, but I'll do the work here. We can remove conditionals like 'supports
checksums' as well.
* I agree we should put these methods in CodecUtil (CodecUtil.readVersion,
writeVersion). To answer Uwe's questions about why a format change is needed
for the version, IMO its way better to encode this in a way that does not
require parsing,.
We can followup with this by improving the exceptions for tiny "slurp-in"
classes like this (I would personally, as in do the work, also fix .fnm,
segments_N, .nvm, .dvm, .fdt, .tvx as well). I would add a
CodecUtil.addSuppressedChecksum or something, to easily allow these guys to
'annotate' any exc on init with checksum failure information. These are small
but important and it would help considering we are dodging challenges like JVM
bugs here.
I also want to bump 5.0 codec anyway, to fix the bug where
Lucene42TermVectorsFormat uses the same codecName as Lucene41StoredFieldsFormat
in the codec header, thats a stupid bug we should fix.
> Give Version parsing exceptions more descriptive error messages
> ---------------------------------------------------------------
>
> Key: LUCENE-5952
> URL: https://issues.apache.org/jira/browse/LUCENE-5952
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 4.10
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Blocker
> Fix For: 4.10.1, 5.0, Trunk
>
> Attachments: LUCENE-5952.patch, LUCENE-5952.patch, LUCENE-5952.patch,
> LUCENE-5952.patch, LUCENE-5952.patch, LUCENE-5952.patch
>
>
> As discussed on the dev list, it's spooky how Version.java tries to fully
> parse the incoming version string ... and then throw exceptions that lack
> details about what invalid value it received, which file contained the
> invalid value, etc.
> It also seems too low level to be checking versions (e.g. is not future proof
> for when 4.10 is passed a 5.x index by accident), and seems redundant with
> the codec headers we already have for checking versions?
> Should we just go back to lenient parsing?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]