Pascal Essiembre created PDFBOX-2723:
----------------------------------------
Summary: PDFBox*.tmp files not deleted by COSParser
Key: PDFBOX-2723
URL: https://issues.apache.org/jira/browse/PDFBOX-2723
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 2.0.0
Environment: Windows and Linux, with issue being critical on Linux
Reporter: Pascal Essiembre
Fix For: 2.0.0
When parsing PDFs, temporary files get created under the system temp directory
(e.g. PDFBox6525369863339991063.tmp). All files created for each documents are
always deleted except for one. So each document parsed adds a new tmp file
that never gets deleted. That's likely due to a stream never closed. When
processing many PDFs on Linux in the same JVM instance, we get the crashing
error: "Too many files open". Changing the max file handle on the OS is not
always an option.
I was able to fix this by modifying the {{COSParser}} class to close a
{{COSStream}} instance:
{code:title=COSParser.java, starting on line 312|borderStyle=solid}
private long parseXrefObjStream(long objByteOffset, boolean isStandalone)
throws IOException
{
// ---- parse indirect object head
readObjectNumber();
readGenerationNumber();
readExpectedString(OBJ_MARKER, true);
COSDictionary dict = parseCOSDictionary();
COSStream xrefStream = parseCOSStream(dict);
parseXrefStream(xrefStream, (int) objByteOffset, isStandalone);
xrefStream.close(); // <--- *** NEW LINE ***
return dict.getLong(COSName.PREV);
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]