I continue this thread on dev list in order to not clutter JIRA issue
PDFBOX-847.
Mahesh Yadav commented on PDFBOX-847:
-------------------------------------
...
We use jackrabbit and only difference that we have is we have our own custom
parser (not provided by jackrabbit) for parsing pdf and we interact with pdfbox
as shown below.
PDFParser parser = new PDFParser(new BufferedInputStream(stream));
PDDocument document = parser.getPDDocument();
parser.parse();
PDFTextStripper stripper = new PDFTextStripper();
stripper.setLineSeparator("\n");
stripper.writeText(document, writer)
I think we need to change above approach and use " PDDocument.load" with
RandomAccessFile
if you set a temporary directory before parse() with
parser.setTempDirectory
it will automatically use temporary file instead of memory buffer.
Timo
--
Timo Boehme
OntoChem GmbH
H.-Damerow-Str. 4
06120 Halle/Saale
T: +49 345 4780474
F: +49 345 4780471
[email protected]
_____________________________________________________________________
OntoChem GmbH
Geschäftsführer: Dr. Lutz Weber
Sitz: Halle / Saale
Registergericht: Stendal
Registernummer: HRB 215461
_____________________________________________________________________