[
https://issues.apache.org/jira/browse/TIKA-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2448:
------------------------------
Attachment: testWORD_phonetic.docx
example docx
> Handle phonetic strings in the SAX docx parser
> ----------------------------------------------
>
> Key: TIKA-2448
> URL: https://issues.apache.org/jira/browse/TIKA-2448
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
> Labels: sax_docx_fixes
> Attachments: testWORD_phonetic.docx
>
>
> On TIKA-2440, [~Takahiro] requested the ability to turn off extraction of
> phonetic runs. We should enable this for docx, too. We'll have to make
> fixes in POI for our DOM docx parser, but it should be fairly straighforward
> in our SAX docx parser.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)