[ 
https://issues.apache.org/jira/browse/LUCENE-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736802#comment-16736802
 ] 

Armin Braun commented on LUCENE-8525:
-------------------------------------

{quote}If the bits were wrong, we don't know why. Maybe its just a hardware 
memory issue, maybe hotspot compiled the code wrong, maybe its a bug in lucene 
code, maybe its something else.

So I think if we hit EOF, the correct thing to do is throw EOFException, thats 
as specific as it gets.
{quote}
 

And that's fine. As I said 
[above|https://issues.apache.org/jira/browse/LUCENE-8525?focusedCommentId=16736342&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16736342]
 EOF is close and Lucene not handling it is ok. What I'm looking for is simply 
some differentiation between "wrong bytes" and "other IO issue". Whether or not 
the bytes are temporarily wrong or not isn't really important, it's and 
unrecoverable issue either way probably. What is important is that there is a 
way to figure out whether Lucene read broken bytes or if it was unable to read 
bytes.
EOF is the somewhat ambiguous corner case here as you point out and it's fine 
to not handle it in Lucene since I don't see how always interpreting it as 
corrupt data is going to have a real error rate ever. But if Lucene throws EOF 
when e.g. we fail to read the number of bytes some 'length' field suggested we 
can read, but throw plain IOException when some bits don't match what we 
expected then that's:

a. somewhat inconsistent
b. really hard to properly handle in the calling code.

isn't it?

> throw more specific exception on data corruption
> ------------------------------------------------
>
>                 Key: LUCENE-8525
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8525
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Vladimir Dolzhenko
>            Priority: Major
>
> DataInput throws generic IOException if data looks odd
> [DataInput:141|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141]
> there are other examples like 
> [BufferedIndexInput:219|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java#L219],
>  
> [CompressionMode:226|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L226]
>  and maybe 
> [DocIdsWriter:81|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L81]
> That leads to some difficulties - see [elasticsearch 
> #34322|https://github.com/elastic/elasticsearch/issues/34322]
> It would be better if it throws more specific exception.
> As a consequence 
> [SegmentInfos.readCommit|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L281]
>  violates its own contract
> {code:java}
> /**
>    * @throws CorruptIndexException if the index is corrupt
>    * @throws IOException if there is a low-level IO error
>    */
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to