[ 
https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964154#comment-13964154
 ] 

Uwe Schindler commented on LUCENE-5583:
---------------------------------------

+1 on having SkipBytes.

One other thing: Currently ChecksumIndexInput throws UnsupportedEx if you try 
to seek, but in a subclass we suddenly allow it again (only forward). Maybe we 
should move the code up to ChecksumIndexinput and document that seeking only 
works forward and may be costly (because it has to read)? In that case we 
implement that with skipBytes(), too. This would allow to use 
ChecksumIndexinput also for other codec parts while merging.

> Should BufferedChecksumIndexInput have its own buffer?
> ------------------------------------------------------
>
>                 Key: LUCENE-5583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5583
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.8
>            Reporter: Adrien Grand
>
> I was playing with on-the-fly checksum verification and this made me stumble 
> upon an issue with {{BufferedChecksumIndexInput}}.
> I have some code that skips over a {{DataInput}} by reading bytes into 
> /dev/null, eg.
> {code}
>   private static final byte[] SKIP_BUFFER = new byte[1024];
>   private static void skipBytes(DataInput in, long numBytes) throws 
> IOException {
>     assert numBytes >= 0;
>     for (long skipped = 0; skipped < numBytes; ) {
>       final int toRead = (int) Math.min(numBytes - skipped, 
> SKIP_BUFFER.length);
>       in.readBytes(SKIP_BUFFER, 0, toRead);
>       skipped += toRead;
>     }
>   }
> {code}
> It is fine to read into this static buffer, even from multiple threads, since 
> the content that is read doesn't matter here. However, it breaks with 
> {{BufferedChecksumIndexInput}} because of the way that it updates the 
> checksum:
> {code}
>   @Override
>   public void readBytes(byte[] b, int offset, int len)
>     throws IOException {
>     main.readBytes(b, offset, len);
>     digest.update(b, offset, len);
>   }
> {code}
> If you are unlucky enough so that a concurrent call to {{skipBytes}} started 
> modifying the content of {{b}} before the call to {{digest.update(b, offset, 
> len)}} finished, then your checksum will be wrong.
> I think we should make {{BufferedChecksumIndexInput}} read into a private 
> buffer first instead of relying on the user-provided buffer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to