[ 
https://issues.apache.org/jira/browse/HADOOP-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578244#comment-13578244
 ] 

Todd Lipcon commented on HADOOP-9307:
-------------------------------------

An example sequence of seeks which returns the wrong data is as follows, 
assuming a 4096-byte buffer:

{code}
seek(0);
readFully(1);
{code}

This primes the buffer. After this, the current state of the buffered stream is 
{{pos=0, count=4096, filepos=4096}}

{code}
seek(2000);
{code}

The seek sees that the required data is in already in the buffer, and just sets 
{{pos=2000}}

{code}
readFully(10000);
{code}

This first copies the remaining bytes from the buffer and sets {{pos=4096}}. 
Then, because 5904 bytes are remaining, and this is larger than the buffer 
size, it copies them directly into the user-supplied output buffer. This leaves 
the state of the stream at {{pos=4096, count=4096, filepos=12000}}

{code}
seek(11000);
{code}

The "optimization" in BufferedFSInputStream sees that there are 4096 buffered 
bytes, and that this seek is supposedly within the window, assuming that those 
4096 bytes directly precede filepos. So, it erroneously just sets {{pos=3096}}.

The next read will then get the wrong results for the first 1000 bytes -- 
yielding bytes 3096-4096 of the file instead of bytes 11000-12000.
                
> BufferedFSInputStream.read returns wrong results after certain seeks
> --------------------------------------------------------------------
>
>                 Key: HADOOP-9307
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9307
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 1.1.1, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> After certain sequences of seek/read, BufferedFSInputStream can silently 
> return data from the wrong part of the file. Further description in first 
> comment below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to