+1, as I'm usually in favour for using the java library.

Tilman

Am 18.02.2014 21:42, schrieb John Hewson:
The streams used by BaseParser and PDFParser are sequential, so you can ignore 
them.
Use of PushBackInputStream in the non-sequential parser seems a little odd.

We might want to think about getting rid of the classes in org.apache.pdfbox.io 
and replacing
them with classes from java.nio.channels. It looks like the PDFBox classes 
pre-date NIO.
With NIO we could use memory mapped files, which for large PDFFiles will 
perform better
than an InputStream.

-- John

On 18 Feb 2014, at 03:53, Maruan Sahyoun <[email protected]> wrote:

Hi,

there are currently a number of different options to use as a base for a 
potential new parser/lexer. The ones currently in use are

BaseParser:
import org.apache.pdfbox.io.PushBackInputStream;
import org.apache.pdfbox.io.RandomAccess;

PDFParser (additional):
import org.apache.pdfbox.io.RandomAccess;

NonSequentialParser:
import org.apache.pdfbox.io.PushBackInputStream;
import org.apache.pdfbox.io.RandomAccess;
import org.apache.pdfbox.io.RandomAccessBuffer;
import org.apache.pdfbox.io.RandomAccessBufferedFileInputStream;

There are some additional Classes/Interfaces in the io package e.g. 
RandomAccessBufferedFileInputStream implementing RandomAccessRead

Any preferences, ideas of consolidating this?

Currently I’m using RandomAccessBufferedFileInputStream with some additional 
implementations of RandomAccessRead to support reading from a ByteArray for 
testing purposes)

BR

Maruan Sahyoun




Reply via email to