I continue this thread on dev list in order to not clutter JIRA issue PDFBOX-847.

Mahesh Yadav commented on PDFBOX-847:
-------------------------------------
...
We use jackrabbit and only difference that we have is we have our own custom 
parser (not provided by jackrabbit) for parsing pdf and we interact with pdfbox 
as shown below.

PDFParser parser = new PDFParser(new BufferedInputStream(stream));
PDDocument document = parser.getPDDocument();
parser.parse();
PDFTextStripper stripper = new PDFTextStripper();
stripper.setLineSeparator("\n");
stripper.writeText(document, writer)

I think we need to change above approach and use " PDDocument.load" with 
RandomAccessFile

if you set a temporary directory before parse() with
parser.setTempDirectory
it will automatically use temporary file instead of memory buffer.


Timo

--

 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780474
 F: +49 345 4780471
 [email protected]

_____________________________________________________________________

 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
_____________________________________________________________________

Reply via email to