[
https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477188#comment-15477188
]
Tim Allison edited comment on TIKA-2069 at 9/9/16 4:19 PM:
-----------------------------------------------------------
Thank you!
This question is for [~jeffswindle] and fellow Tika devs (esp. [~rgauss] and
[~gagravarr]), should we:
1) add macro text as metadata items (e.g. msoffice:macro)
2) inline them in the content via <div> elements?
3) treat them as embedded documents (mime type would be?)
I'd prefer option 1 or 3. Option 1 is probably simpler for end users; but
option 3 would allow us to capture metadata about the macro.
-[~jeffswindle], the title of this issue is for msoffice...is it ok to limit
this to ooxml? Do you need this for the older doc and xls?- Already handled by
POI at no extra cost. :)
was (Author: [email protected]):
Thank you!
This question is for [~jeffswindle] and fellow Tika devs (esp. [~rgauss]),
should we add macros as metadata items or inline them in the content via <div>
elements?
I'd prefer a metadata item for each macro, but could go either way.
[~jeffswindle], the title of this issue is for msoffice...is it ok to limit
this to ooxml? Do you need this for the older doc and xls?
> Extract Macro text from Microsoft Office documents
> --------------------------------------------------
>
> Key: TIKA-2069
> URL: https://issues.apache.org/jira/browse/TIKA-2069
> Project: Tika
> Issue Type: Improvement
> Components: detector, parser
> Affects Versions: 1.13
> Environment: RHEL 5.x, Apache Tomcat
> Reporter: Jeff Swindle
> Labels: features
> Attachments: excel-macro.PNG, test-macro-doc.docm,
> test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm,
> xlsmacro.xlsm.tika-app-output.txt
>
>
> Tika supports macro-enabled Microsoft Office documents by extracting metadata
> and contents, however, macros within the document are not in the metadata or
> content output.
> Desire is to have the macro text extracted also.
> Info regarding macro extraction: http://www.decalage.info/vba_tools
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)