Fabian Lange created TIKA-1474:
----------------------------------

             Summary: PackageParser leaves 7zip Temp Files behind
                 Key: TIKA-1474
                 URL: https://issues.apache.org/jira/browse/TIKA-1474
             Project: Tika
          Issue Type: Bug
          Components: parser
            Reporter: Fabian Lange


If I put a 7z input stream into tika parser, tika will make a temp file in 
PackageParser 

{code}
        ArchiveInputStream ais;
        try {
            ArchiveStreamFactory factory = context.get(
                    ArchiveStreamFactory.class, new ArchiveStreamFactory());
            ais = factory.createArchiveInputStream(stream);
        } catch (StreamingNotSupportedException sne) {
            // Most archive formats work on streams, but a few need files
            if (sne.getFormat().equals(ArchiveStreamFactory.SEVEN_Z)) {
                // Rework as a file, and wrap
                stream.reset();
                TikaInputStream tstream = TikaInputStream.get(stream);
                
                // Pending a fix for COMPRESS-269, this bit is a little nasty
                ais = new SevenZWrapper(new SevenZFile(tstream.getFile()));
            } else {
                throw new TikaException("Unknown non-streaming format " + 
sne.getFormat(), sne);
            }
        } catch (ArchiveException e) {
            throw new TikaException("Unable to unpack document stream", e);
        }
{code}

tstream.getFile() will then internally make a new temp file:

{code}
                // Spool the entire stream into a temporary file
                file = tmp.createTemporaryFile();
                OutputStream out = new FileOutputStream(file);
{code}

this file is not deleted because SevenZWrapper does not close the SevenZFile.

This can be fixed by implementing the following close method in SevenZWrapper

{code}
public void close() throws IOException {
try {
file.close();
} finally {
super.close();
}
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to