[
https://issues.apache.org/jira/browse/PDFBOX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935447#comment-14935447
]
Tilman Hausherr commented on PDFBOX-2883:
-----------------------------------------
Consider this code:
{code}
@Test
public void DeleteTest() throws FileNotFoundException
{
File f = new File("test.pdf");
PrintWriter pw = new PrintWriter(new FileOutputStream(f));
pw.write("<script language='JavaScript'>");
pw.close();
PDDocument doc = null;
try
{
doc = PDDocument.load(f);
Assert.fail("parsing should fail");
}
catch (IOException ex)
{
// expected
}
finally
{
Assert.assertNull(doc);
}
try
{
Files.delete(f.toPath());
}
catch (IOException ex)
{
Assert.fail("delete file after failed load() failed");
}
}
{code}
It fails to delete the file after the load() fails, because the file is still
in use.
One solution would be to change PDDocument.load(File file, ...) like this:
{code}
public static PDDocument load(File file, String password, InputStream
keyStore, String alias,
MemoryUsageSetting memUsageSetting) throws
IOException
{
RandomAccessBufferedFileInputStream raFile = new
RandomAccessBufferedFileInputStream(file);
PDFParser parser = new PDFParser(raFile, password, keyStore, alias, new
ScratchFile(memUsageSetting));
try
{
parser.parse();
}
finally
{
raFile.close();
}
return parser.getPDDocument();
}
{code}
Surprisingly, this works for me. But is it correct, i.e. is the original file
no longer used even after parsing succeeds?
> Unify memory handling
> ---------------------
>
> Key: PDFBOX-2883
> URL: https://issues.apache.org/jira/browse/PDFBOX-2883
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Timo Boehme
> Assignee: Timo Boehme
> Fix For: 2.0.0
>
> Attachments: MemoryUsage.java
>
>
> PDFBOX now has at least 2 different mechanisms to use main memory vs. keeping
> large data in temporary file: in case of provided input stream the stream is
> copied to temporary file and all read PDF streams are handled by
> RandomAccessBuffer/ScratchFile.
> In PDFBOX-2882 I've done a re-implementation for ScratchFile which is quite
> fast and allows to set a maximum amount of memory to be used for its pages
> before it starts using the scratch file. This implementation could be used as
> the general 'backend' for all buffered streams and even the file input stream
> copy. As long as the PDF fits into the allowed maximum memory it should
> equally fast as RandomAccessBuffer while it allows for good control of memory
> usage by going to scratch file if needed. This prevents OOM in case of large
> files.
> In order to use this the PDDocument methods should be changed to not have a
> 'useScratchFile' parameter but to take a MemoryHandling object which details
> the Buffering strategy (using ScratchFile; what amount of main memory can be
> used, ...).
> I've opened this issue for discussing. Since we need API changes in
> PDDocument it should be done before 2.0 release.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]