Moritz Dorka created TIKA-1468:
----------------------------------
Summary: Symbol character handling in WordExtractor
Key: TIKA-1468
URL: https://issues.apache.org/jira/browse/TIKA-1468
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.6
Reporter: Moritz Dorka
Priority: Minor
Attached is a patch to allow for proper handling of _symbol characters_ in
*.doc files (i.e. stuff which can be inserted via Insert->Symbol in Word).
Side note: I am a little unsure where exactly the boundary between the scope of
TIKA and POI lies here. Theorectically one could add that patch to
{{org.apache.poi.hwpf.converter.AbstractWordConverter.processSymbol(HWPFDocument,
CharacterRun, Element)}} as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)