The streams used by BaseParser and PDFParser are sequential, so you can ignore them. Use of PushBackInputStream in the non-sequential parser seems a little odd.
We might want to think about getting rid of the classes in org.apache.pdfbox.io and replacing them with classes from java.nio.channels. It looks like the PDFBox classes pre-date NIO. With NIO we could use memory mapped files, which for large PDFFiles will perform better than an InputStream. -- John On 18 Feb 2014, at 03:53, Maruan Sahyoun <[email protected]> wrote: > Hi, > > there are currently a number of different options to use as a base for a > potential new parser/lexer. The ones currently in use are > > BaseParser: > import org.apache.pdfbox.io.PushBackInputStream; > import org.apache.pdfbox.io.RandomAccess; > > PDFParser (additional): > import org.apache.pdfbox.io.RandomAccess; > > NonSequentialParser: > import org.apache.pdfbox.io.PushBackInputStream; > import org.apache.pdfbox.io.RandomAccess; > import org.apache.pdfbox.io.RandomAccessBuffer; > import org.apache.pdfbox.io.RandomAccessBufferedFileInputStream; > > There are some additional Classes/Interfaces in the io package e.g. > RandomAccessBufferedFileInputStream implementing RandomAccessRead > > Any preferences, ideas of consolidating this? > > Currently I’m using RandomAccessBufferedFileInputStream with some additional > implementations of RandomAccessRead to support reading from a ByteArray for > testing purposes) > > BR > > Maruan Sahyoun > >
