The streams used by BaseParser and PDFParser are sequential, so you can ignore 
them.
Use of PushBackInputStream in the non-sequential parser seems a little odd. 

We might want to think about getting rid of the classes in org.apache.pdfbox.io 
and replacing
them with classes from java.nio.channels. It looks like the PDFBox classes 
pre-date NIO.
With NIO we could use memory mapped files, which for large PDFFiles will 
perform better
than an InputStream.

-- John

On 18 Feb 2014, at 03:53, Maruan Sahyoun <[email protected]> wrote:

> Hi,
> 
> there are currently a number of different options to use as a base for a 
> potential new parser/lexer. The ones currently in use are
> 
> BaseParser: 
> import org.apache.pdfbox.io.PushBackInputStream;
> import org.apache.pdfbox.io.RandomAccess;
> 
> PDFParser (additional):
> import org.apache.pdfbox.io.RandomAccess;
> 
> NonSequentialParser:
> import org.apache.pdfbox.io.PushBackInputStream;
> import org.apache.pdfbox.io.RandomAccess;
> import org.apache.pdfbox.io.RandomAccessBuffer;
> import org.apache.pdfbox.io.RandomAccessBufferedFileInputStream;
> 
> There are some additional Classes/Interfaces in the io package e.g. 
> RandomAccessBufferedFileInputStream implementing RandomAccessRead
> 
> Any preferences, ideas of consolidating this? 
> 
> Currently I’m using RandomAccessBufferedFileInputStream with some additional 
> implementations of RandomAccessRead to support reading from a ByteArray for 
> testing purposes)
> 
> BR
> 
> Maruan Sahyoun
> 
> 

Reply via email to