[
https://issues.apache.org/jira/browse/TIKA-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219386#comment-14219386
]
Milan Zivkovic commented on TIKA-1473:
--------------------------------------
I am not really sure how to clean the sensitive data. If you can help me I
would gladly do that.
If I try to save using word ( even without changing anything ) in the document
I can not reproduce the problem with the newly created document.
I was also thinking that I can unzip the file leave the structure and than zip
again but same thing just unzipping and zipping again can not re-produce the
problem with the new document. Maybe I am doing something wrong here?
If I ran the Linux file command for the file file I get the "Microsoft Word
2007+". If I unzip and zip again I get the " Microsoft OOXML" as an output from
the file command.
> Apache Tika is not working for .docx documents
> -----------------------------------------------
>
> Key: TIKA-1473
> URL: https://issues.apache.org/jira/browse/TIKA-1473
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.5, 1.6
> Reporter: Franco Catto
> Priority: Blocker
>
> I am using Apache Tika 1.6 to read different document files.
> It is reading pdf and old format doc files but when I try to read docx file,
> it gives me following exception:
> org.apache.tika.exception.TikaException: Failed to close temporary resources
> at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:152)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:127)
> ...
> The resource can not be closed because it is still being used by the Java
> Process, certainly the OOXML parser.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)