[ 
https://issues.apache.org/jira/browse/TIKA-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203923#comment-14203923
 ] 

Nick Burch commented on TIKA-1468:
----------------------------------

Any chance of a small junit unit test for this? Probably involving a short test 
and a very small test word document?

As for the right location of the logic, it might be better in POI itself. That 
way, users of POI will benefit too, and we minimise the amount of POI-specific 
logic in Tika. POI 3.11 beta 3 is being voted on right now, but we ought to be 
able to get it into the next release

> Symbol character handling in WordExtractor
> ------------------------------------------
>
>                 Key: TIKA-1468
>                 URL: https://issues.apache.org/jira/browse/TIKA-1468
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Moritz Dorka
>            Priority: Minor
>         Attachments: WordExtractor.patch
>
>
> Attached is a patch to allow for proper handling of _symbol characters_ in 
> *.doc files (i.e. stuff which can be inserted via Insert->Symbol in Word).
> Side note: I am a little unsure where exactly the boundary between the scope 
> of TIKA and POI lies here. Theorectically one could add that patch to 
> {{org.apache.poi.hwpf.converter.AbstractWordConverter.processSymbol(HWPFDocument,
>  CharacterRun, Element)}} as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to