[ 
https://issues.apache.org/jira/browse/TIKA-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-2448:
------------------------------
    Attachment: testWORD_phonetic.docx

example docx

> Handle phonetic strings in the SAX docx parser
> ----------------------------------------------
>
>                 Key: TIKA-2448
>                 URL: https://issues.apache.org/jira/browse/TIKA-2448
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>              Labels: sax_docx_fixes
>         Attachments: testWORD_phonetic.docx
>
>
> On TIKA-2440, [~Takahiro] requested the ability to turn off extraction of 
> phonetic runs.  We should enable this for docx, too.  We'll have to make 
> fixes in POI for our DOM docx parser, but it should be fairly straighforward 
> in our SAX docx parser.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to