Fabian Lange created TIKA-1474:
----------------------------------
Summary: PackageParser leaves 7zip Temp Files behind
Key: TIKA-1474
URL: https://issues.apache.org/jira/browse/TIKA-1474
Project: Tika
Issue Type: Bug
Components: parser
Reporter: Fabian Lange
If I put a 7z input stream into tika parser, tika will make a temp file in
PackageParser
{code}
ArchiveInputStream ais;
try {
ArchiveStreamFactory factory = context.get(
ArchiveStreamFactory.class, new ArchiveStreamFactory());
ais = factory.createArchiveInputStream(stream);
} catch (StreamingNotSupportedException sne) {
// Most archive formats work on streams, but a few need files
if (sne.getFormat().equals(ArchiveStreamFactory.SEVEN_Z)) {
// Rework as a file, and wrap
stream.reset();
TikaInputStream tstream = TikaInputStream.get(stream);
// Pending a fix for COMPRESS-269, this bit is a little nasty
ais = new SevenZWrapper(new SevenZFile(tstream.getFile()));
} else {
throw new TikaException("Unknown non-streaming format " +
sne.getFormat(), sne);
}
} catch (ArchiveException e) {
throw new TikaException("Unable to unpack document stream", e);
}
{code}
tstream.getFile() will then internally make a new temp file:
{code}
// Spool the entire stream into a temporary file
file = tmp.createTemporaryFile();
OutputStream out = new FileOutputStream(file);
{code}
this file is not deleted because SevenZWrapper does not close the SevenZFile.
This can be fixed by implementing the following close method in SevenZWrapper
{code}
public void close() throws IOException {
try {
file.close();
} finally {
super.close();
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)