[
https://issues.apache.org/jira/browse/TIKA-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069833#comment-18069833
]
ASF GitHub Bot commented on TIKA-4704:
--------------------------------------
Copilot commented on code in PR #2726:
URL: https://github.com/apache/tika/pull/2726#discussion_r3014273018
##########
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java:
##########
@@ -131,47 +132,52 @@ public void parse(InputStream stream, ContentHandler
baseHandler, Metadata metad
// Use a File if we can, and an already open zip is even better
ZipFile zipFile = null;
TikaInputStream tmpTis = null;
- if (stream instanceof TikaInputStream) {
- TikaInputStream tis = (TikaInputStream) stream;
- Object container = ((TikaInputStream) stream).getOpenContainer();
- if (container instanceof ZipFile) {
- zipFile = (ZipFile) container;
+ TemporaryResources tmp = new TemporaryResources();
+ try {
+ if (stream instanceof TikaInputStream) {
+ TikaInputStream tis = (TikaInputStream) stream;
+ Object container = ((TikaInputStream)
stream).getOpenContainer();
+ if (container instanceof ZipFile) {
+ zipFile = (ZipFile) container;
+ } else {
+ zipFile = new ZipFile(tis.getFile());
+ tis.setOpenContainer(zipFile);
+ }
} else {
- zipFile = new ZipFile(tis.getFile());
- tis.setOpenContainer(zipFile);
+ tmpTis = TikaInputStream.get(stream, tmp, metadata);
+ tmpTis.setOpenContainer(new ZipFile(tmpTis.getFile()));
+ zipFile = (ZipFile) tmpTis.getOpenContainer();
}
- } else {
- tmpTis = TikaInputStream.get(stream);
- tmpTis.setOpenContainer(new ZipFile(tmpTis.getFile()));
- zipFile = (ZipFile) tmpTis.getOpenContainer();
- }
- // Prepare to handle the content
- XHTMLContentHandler xhtml = new XHTMLContentHandler(baseHandler,
metadata);
- xhtml.startDocument();
- // As we don't know which of the metadata or the content
- // we'll hit first, catch the endDocument call initially
- EndDocumentShieldingContentHandler handler = new
EndDocumentShieldingContentHandler(xhtml);
+ // Prepare to handle the content
+ XHTMLContentHandler xhtml = new XHTMLContentHandler(baseHandler,
metadata);
+ xhtml.startDocument();
+ // As we don't know which of the metadata or the content
+ // we'll hit first, catch the endDocument call initially
+ EndDocumentShieldingContentHandler handler = new
EndDocumentShieldingContentHandler(xhtml);
- try {
try {
- handleZipFile(zipFile, metadata, context, handler,
embeddedDocumentUtil);
- } finally {
- //Do we want to close silently == catch an exception here?
- if (tmpTis != null) {
- //tmpTis handles closing of the open zip container
- tmpTis.close();
+ try {
+ handleZipFile(zipFile, metadata, context, handler,
embeddedDocumentUtil);
+ } finally {
+ //Do we want to close silently == catch an exception here?
+ if (tmpTis != null) {
+ //tmpTis handles closing of the open zip container
+ tmpTis.close();
+ }
}
+ } catch (SAXException e) {
+ if (e.getCause() instanceof EncryptedDocumentException) {
+ throw (EncryptedDocumentException)e.getCause();
+ }
+ throw e;
}
- } catch (SAXException e) {
- if (e.getCause() instanceof EncryptedDocumentException) {
- throw (EncryptedDocumentException)e.getCause();
- }
- throw e;
- }
- // Only now call the end document
- if (handler.isEndDocumentWasCalled()) {
- handler.reallyEndDocument();
+ // Only now call the end document
+ if (handler.isEndDocumentWasCalled()) {
+ handler.reallyEndDocument();
+ }
+ } finally {
+ tmp.dispose();
}
Review Comment:
`tmp.dispose()` in the outer `finally` can mask an
`IOException`/`SAXException` thrown during parsing if disposal itself fails
(because `dispose()` throws a new `TikaException`). Consider switching to
try-with-resources for `TemporaryResources` (using `close()`), or otherwise
ensure disposal exceptions are suppressed/added rather than overriding the
original failure.
> Avoid remaining temp files
> --------------------------
>
> Key: TIKA-4704
> URL: https://issues.apache.org/jira/browse/TIKA-4704
> Project: Tika
> Issue Type: Task
> Affects Versions: 3.3.0
> Reporter: Tilman Hausherr
> Priority: Minor
> Fix For: 4.0.0, 3.3.1
>
> Attachments: screenshot-1.png
>
>
> This is my temp directory after a successful build of tika 3. We should try
> to lessen this.
> !screenshot-1.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)