Re: pdfbox.io - which should I use

John Hewson Wed, 19 Feb 2014 00:25:35 -0800

Well spotted. Sticking with RandomAccessRead looks good then, with wrappers for 
NIO, etc.


-- John

> On 18 Feb 2014, at 23:29, Maruan Sahyoun <[email protected]> wrote:
> 
> Hi John,
> 
> forgot that - SeekableByteChannel is Java 1.7
> 
> BR
> Maruan Sahyoun
> 
>> Am 19.02.2014 um 04:45 schrieb John Hewson <[email protected]>:
>> 
>> RandomAccessRead looks like it could be replaced with 
>> java.nio.channels.SeekableByteChannel as implemented by 
>> java.nio.channels.FileChannel.
>> 
>> -- John
>> 
>>> On 18 Feb 2014, at 12:50, Maruan Sahyoun <[email protected]> wrote:
>>> 
>>> Yes, we could use RandomAccessRead as a base and subclasses to wrap NIO and 
>>> others. 
>>> 
>>> Then the parsers would use RandomAccessRead
>>> 
>>> WDYT
>>> 
>>> Maruan Sahyoun
>>> 
>>>> Am 18.02.2014 um 21:42 schrieb John Hewson <[email protected]>:
>>>> 
>>>> The streams used by BaseParser and PDFParser are sequential, so you can 
>>>> ignore them.
>>>> Use of PushBackInputStream in the non-sequential parser seems a little 
>>>> odd. 
>>>> 
>>>> We might want to think about getting rid of the classes in 
>>>> org.apache.pdfbox.io and replacing
>>>> them with classes from java.nio.channels. It looks like the PDFBox classes 
>>>> pre-date NIO.
>>>> With NIO we could use memory mapped files, which for large PDFFiles will 
>>>> perform better
>>>> than an InputStream.
>>>> 
>>>> -- John
>>>> 
>>>>> On 18 Feb 2014, at 03:53, Maruan Sahyoun <[email protected]> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> there are currently a number of different options to use as a base for a 
>>>>> potential new parser/lexer. The ones currently in use are
>>>>> 
>>>>> BaseParser: 
>>>>> import org.apache.pdfbox.io.PushBackInputStream;
>>>>> import org.apache.pdfbox.io.RandomAccess;
>>>>> 
>>>>> PDFParser (additional):
>>>>> import org.apache.pdfbox.io.RandomAccess;
>>>>> 
>>>>> NonSequentialParser:
>>>>> import org.apache.pdfbox.io.PushBackInputStream;
>>>>> import org.apache.pdfbox.io.RandomAccess;
>>>>> import org.apache.pdfbox.io.RandomAccessBuffer;
>>>>> import org.apache.pdfbox.io.RandomAccessBufferedFileInputStream;
>>>>> 
>>>>> There are some additional Classes/Interfaces in the io package e.g. 
>>>>> RandomAccessBufferedFileInputStream implementing RandomAccessRead
>>>>> 
>>>>> Any preferences, ideas of consolidating this? 
>>>>> 
>>>>> Currently I’m using RandomAccessBufferedFileInputStream with some 
>>>>> additional implementations of RandomAccessRead to support reading from a 
>>>>> ByteArray for testing purposes)
>>>>> 
>>>>> BR
>>>>> 
>>>>> Maruan Sahyoun
>

Re: pdfbox.io - which should I use

Reply via email to