[ 
https://issues.apache.org/jira/browse/LUCENE-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141633#comment-14141633
 ] 

Robert Muir commented on LUCENE-5952:
-------------------------------------

thanks for beefing this up. the .si file is really centric to the segment, so 
any safety we can add is good.

A few questions:
* Can we encode 3 ints instead of 4? As far as I know, the 'prerelease' was 
added to support 4.0-alpha/4.0-beta. This was confusing (my fault), and this 
confusion ultimately worked its way into an index corruption bug. I think we 
should try to contain it to 4.0 instead and not keep things complicated like 
that.
* Can we consider just making a new 5.0 si writer? its a pain to bump the codec 
version, but I'll do the work here. We can remove conditionals like 'supports 
checksums' as well. 
* I agree we should put these methods in CodecUtil (CodecUtil.readVersion, 
writeVersion). To answer Uwe's questions about why a format change is needed 
for the version, IMO its way better to encode this in a way that does not 
require parsing,.

We can followup with this by improving the exceptions for tiny "slurp-in" 
classes like this (I would personally, as in do the work, also fix .fnm, 
segments_N, .nvm, .dvm, .fdt, .tvx as well). I would add a 
CodecUtil.addSuppressedChecksum or something, to easily allow these guys to 
'annotate' any exc on init with checksum failure information. These are small 
but important and it would help considering we are dodging challenges like JVM 
bugs here.

I also want to bump 5.0 codec anyway, to fix the bug where 
Lucene42TermVectorsFormat uses the same codecName as Lucene41StoredFieldsFormat 
in the codec header, thats a stupid bug we should fix.

> Give Version parsing exceptions more descriptive error messages
> ---------------------------------------------------------------
>
>                 Key: LUCENE-5952
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5952
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.10
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Blocker
>             Fix For: 4.10.1, 5.0, Trunk
>
>         Attachments: LUCENE-5952.patch, LUCENE-5952.patch, LUCENE-5952.patch, 
> LUCENE-5952.patch, LUCENE-5952.patch, LUCENE-5952.patch
>
>
> As discussed on the dev list, it's spooky how Version.java tries to fully 
> parse the incoming version string ... and then throw exceptions that lack 
> details about what invalid value it received, which file contained the 
> invalid value, etc.
> It also seems too low level to be checking versions (e.g. is not future proof 
> for when 4.10 is passed a 5.x index by accident), and seems redundant with 
> the codec headers we already have for checking versions?
> Should we just go back to lenient parsing?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to