[ 
https://issues.apache.org/jira/browse/TIKA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725699#comment-15725699
 ] 

Hudson commented on TIKA-2191:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1150 (See 
[https://builds.apache.org/job/Tika-trunk/1150/])
TIKA-2191 -- step1 -- add other docx tests and comment/ignore where (tallison: 
rev 894301307da5167c95585688f9448d3050f53aaa)
* (add) 
tika-parsers/src/test/resources/org/apache/tika/parser/microsoft/tika-config-sax-docx.xml
* (add) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java
* (delete) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/xwpf/SXWPFExtractorTest.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
TIKA-2191 -- step2 -- add handling for docm files...extract macros (tallison: 
rev f93d4e1fffdb4a441f7fa750a43691adfa70c392)
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/SXWPFWordExtractorDecorator.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java
TIKA-2191 -- step 3 -- clean up <b> and <i> tag handling (tallison: rev 
1aca10a26dada02a045a1bc9eb7c3cfc1b36a83e)
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFEventBasedWordExtractor.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFDocumentXMLBodyHandler.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFTikaBodyPartHandler.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java
TIKA-2191 -- step 4-- add markup for embedded pics (tallison: rev 
806eaf8b1802a3a3071a5ae0bdc35c20d6739280)
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/SXWPFWordExtractorDecorator.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFTikaBodyPartHandler.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFDocumentXMLBodyHandler.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFEventBasedWordExtractor.java
TIKA-2191 -- step 5 actually extract images embedded in areas besides 
(tallison: rev 4469ca2c4ea725e9f5d94c116aaf248deea2a6eb)
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/SXWPFWordExtractorDecorator.java
* (add) 
tika-parsers/src/test/resources/test-documents/testWORD_embedded_pics.docx
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXWPFExtractorTest.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/XWPFDocumentXMLBodyHandler.java
update changes for TIKA-2191 and TIKA-2192 (tallison: rev 
5425d02a1ed97ce5f884a076f55ad8197cc6ac7b)
* (edit) CHANGES.txt


> Apply current .docx unit tests to experimental SAX parser and fix or document 
> as necessary
> ------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2191
>                 URL: https://issues.apache.org/jira/browse/TIKA-2191
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>
> There are many areas for clean up to ensure that the new SAX .docx parser 
> yields similar results to the legacy DOM .docx parser.  Let's use this issue 
> to track work on improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to