[ https://issues.apache.org/jira/browse/TIKA-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Moritz Dorka updated TIKA-1468: ------------------------------- Attachment: WordExtractor.patch > Symbol character handling in WordExtractor > ------------------------------------------ > > Key: TIKA-1468 > URL: https://issues.apache.org/jira/browse/TIKA-1468 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.6 > Reporter: Moritz Dorka > Priority: Minor > Attachments: WordExtractor.patch > > > Attached is a patch to allow for proper handling of _symbol characters_ in > *.doc files (i.e. stuff which can be inserted via Insert->Symbol in Word). > Side note: I am a little unsure where exactly the boundary between the scope > of TIKA and POI lies here. Theorectically one could add that patch to > {{org.apache.poi.hwpf.converter.AbstractWordConverter.processSymbol(HWPFDocument, > CharacterRun, Element)}} as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)