[ 
https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710097#comment-15710097
 ] 

Hudson commented on TIKA-2187:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1147 (See 
[https://builds.apache.org/job/Tika-trunk/1147/])
TIKA-2187 -- change default behavior in experimental .docx parser to (tallison: 
rev fe20ecd83ea43e5ec6ad0e9fded9d803cb011251)
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
* (edit) CHANGES.txt
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/xwpf/SXWPFExtractorTest.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/xwpf/ml2006/Word2006MLParserTest.java
* (add) tika-parsers/src/test/resources/test-documents/testWORD_2006ml.doc
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java


> Align default behavior of experimental docx parser with that of doc parser in 
> handling delText
> ----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2187
>                 URL: https://issues.apache.org/jira/browse/TIKA-2187
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 2.0, 1.15
>
>
> Now that we can ignore delText via the experimental alternate SAXParser for 
> .docx files, let's make that the default behavior to align with the expected 
> behavior for our .doc parser (ignore deleted text).
> Let's also add the ability to include deleted text from .doc files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to