[ 
https://issues.apache.org/jira/browse/PDFBOX-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648501#comment-13648501
 ] 

Maruan Sahyoun commented on PDFBOX-1582:
----------------------------------------

Hi,

thanks for providing these tests - a very useful exercise. Running your tests 
and some others RandomAccessFileInputStream behaves inconsistently if during 
instantiation you pass an offset and a length which 'reads' behind the actual 
length of the file. e.g. if you pass in = new 
RandomAccessFileInputStream(raFile, 98, 2) instead of in = new 
RandomAccessFileInputStream(raFile, 98, 10) available() always returns 0 at EOF 
and skip() also (for skip that's different to java.io.FileInputStream btw.).

So from that I think you would expect available() to return 0 after EOF and 
skip() too but the behavior for handling the instantiation for cases where 
offset,length go beyond the actual data needs some review. In addition maybe 
skip() could be changed to be inline with java.io.FIS.

Having said all that this may or may not be the root cause of the issues you 
are getting. Could you open another case explaining what you are trying to 
achieve, sample code and sample PDF(s) to review that.

BR
Maruan
 
                
> Issues with available() and skip() on RandomAccessFileInputStream
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-1582
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1582
>             Project: PDFBox
>          Issue Type: Test
>          Components: Parsing
>    Affects Versions: 1.8.1
>            Reporter: Fredrik Kjellberg
>            Priority: Minor
>         Attachments: TestRandomAccessFileInputStream_diff.txt
>
>
> I'm trying to track down a strange bug when parsing PDF files on the IBM JDK 
> that sometimes is giving me stack traces from RandomAccessFile classes. I 
> started by writing unit tests for the PDFBox classes to verify their behavior 
> and found a few issues. Can someone more familiar with the PDFBox code base 
> please check the unit test I wrote and give advise on how it is supposed to 
> work? I've added a TODO for each line where I'm in doubt what should be 
> returned.
> This unit test is for RandomAccessFileInputStream where I've found a few 
> issues. The first is what available() is supposed to return if the input 
> stream tries to go beyond the EOF of the underlying file? When reading single 
> bytes it count down while still returning -1 and when reading a buffer, it is 
> returning what it think is left. The JDK documentation states that 
> available() may not return the absolute truth, so perhaps returning what it 
> think is left is okay, but it shouldn't count down in single reads beyond 
> EOF? Maybe it should be set to zero once a read beyond the EOF is detected?
> Another issue is with skip() where the JDK documentation states that it 
> should return the actual number of bytes skipped. When skipping beyond the 
> EOF of the file, it does not return the actual number of skipped bytes. Also 
> the underlying file is not updated with the new position. Is this correct 
> behavior?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to