Tim Allison resolved TIKA-2069.
       Resolution: Fixed
    Fix Version/s: 1.14

I think there may be a bit more work to do at the POI level.  There are still a 
few open issues in POI for NPE, AIOOBE, etc.  Tika is currently swallowing 
these...I plan to do a run against our regression corpus with the swallowing 
turned off to help us prioritize known and identify new bugs in macro 
extraction at the POI level.

I also found that POI wasn't extracting macros from the 'ppt' file I created as 
a test (see [poi 60162|https://bz.apache.org/bugzilla/show_bug.cgi?id=60162]). 

Patches are welcomed!

Let's close this ticket and open another to track the improvements in POI.

> Extract Macro text from Microsoft Office documents
> --------------------------------------------------
>                 Key: TIKA-2069
>                 URL: https://issues.apache.org/jira/browse/TIKA-2069
>             Project: Tika
>          Issue Type: Improvement
>          Components: detector, parser
>    Affects Versions: 1.13
>         Environment: RHEL 5.x, Apache Tomcat
>            Reporter: Jeff Swindle
>              Labels: features
>             Fix For: 2.0, 1.14
>         Attachments: excel-macro.PNG, test-macro-doc.docm, 
> test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, 
> xlsmacro.xlsm.tika-app-output.txt
> Tika supports macro-enabled Microsoft Office documents by extracting metadata 
> and contents, however, macros within the document are not in the metadata or 
> content output.
> Desire is to have the macro text extracted also.
> Info regarding macro extraction: http://www.decalage.info/vba_tools

This message was sent by Atlassian JIRA

Reply via email to