[ https://issues.apache.org/jira/browse/LUCENE-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736802#comment-16736802 ]
Armin Braun commented on LUCENE-8525: ------------------------------------- {quote}If the bits were wrong, we don't know why. Maybe its just a hardware memory issue, maybe hotspot compiled the code wrong, maybe its a bug in lucene code, maybe its something else. So I think if we hit EOF, the correct thing to do is throw EOFException, thats as specific as it gets. {quote} And that's fine. As I said [above|https://issues.apache.org/jira/browse/LUCENE-8525?focusedCommentId=16736342&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16736342] EOF is close and Lucene not handling it is ok. What I'm looking for is simply some differentiation between "wrong bytes" and "other IO issue". Whether or not the bytes are temporarily wrong or not isn't really important, it's and unrecoverable issue either way probably. What is important is that there is a way to figure out whether Lucene read broken bytes or if it was unable to read bytes. EOF is the somewhat ambiguous corner case here as you point out and it's fine to not handle it in Lucene since I don't see how always interpreting it as corrupt data is going to have a real error rate ever. But if Lucene throws EOF when e.g. we fail to read the number of bytes some 'length' field suggested we can read, but throw plain IOException when some bits don't match what we expected then that's: a. somewhat inconsistent b. really hard to properly handle in the calling code. isn't it? > throw more specific exception on data corruption > ------------------------------------------------ > > Key: LUCENE-8525 > URL: https://issues.apache.org/jira/browse/LUCENE-8525 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Vladimir Dolzhenko > Priority: Major > > DataInput throws generic IOException if data looks odd > [DataInput:141|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L141] > there are other examples like > [BufferedIndexInput:219|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java#L219], > > [CompressionMode:226|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L226] > and maybe > [DocIdsWriter:81|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L81] > That leads to some difficulties - see [elasticsearch > #34322|https://github.com/elastic/elasticsearch/issues/34322] > It would be better if it throws more specific exception. > As a consequence > [SegmentInfos.readCommit|https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L281] > violates its own contract > {code:java} > /** > * @throws CorruptIndexException if the index is corrupt > * @throws IOException if there is a low-level IO error > */ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org