[
https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510207#comment-15510207
]
Tim Allison commented on TIKA-2069:
-----------------------------------
[~jeffswindle], I should point out that the VBAMacroReader is still relatively
new in POI, and there are currently 3 open bugs, one triggered by the docm file
that you submitted.
* [60158|https://bz.apache.org/bugzilla/show_bug.cgi?id=60158]
* [59830|https://bz.apache.org/bugzilla/show_bug.cgi?id=59830]
* [59858|https://bz.apache.org/bugzilla/show_bug.cgi?id=59858]
For now, we'll swallow the exceptions in Tika, but there's much more work to be
done. Patches to POI would be welcomed! :)
> Extract Macro text from Microsoft Office documents
> --------------------------------------------------
>
> Key: TIKA-2069
> URL: https://issues.apache.org/jira/browse/TIKA-2069
> Project: Tika
> Issue Type: Improvement
> Components: detector, parser
> Affects Versions: 1.13
> Environment: RHEL 5.x, Apache Tomcat
> Reporter: Jeff Swindle
> Labels: features
> Attachments: excel-macro.PNG, test-macro-doc.docm,
> test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm,
> xlsmacro.xlsm.tika-app-output.txt
>
>
> Tika supports macro-enabled Microsoft Office documents by extracting metadata
> and contents, however, macros within the document are not in the metadata or
> content output.
> Desire is to have the macro text extracted also.
> Info regarding macro extraction: http://www.decalage.info/vba_tools
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)