[
https://issues.apache.org/jira/browse/TIKA-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Moritz Dorka updated TIKA-1468:
-------------------------------
Attachment: WordExtractor.patch
> Symbol character handling in WordExtractor
> ------------------------------------------
>
> Key: TIKA-1468
> URL: https://issues.apache.org/jira/browse/TIKA-1468
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.6
> Reporter: Moritz Dorka
> Priority: Minor
> Attachments: WordExtractor.patch
>
>
> Attached is a patch to allow for proper handling of _symbol characters_ in
> *.doc files (i.e. stuff which can be inserted via Insert->Symbol in Word).
> Side note: I am a little unsure where exactly the boundary between the scope
> of TIKA and POI lies here. Theorectically one could add that patch to
> {{org.apache.poi.hwpf.converter.AbstractWordConverter.processSymbol(HWPFDocument,
> CharacterRun, Element)}} as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)