[ 
https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132895#comment-15132895
 ] 

Tim Allison commented on TIKA-1836:
-----------------------------------

Committed workaround to log rather than throw an exception in POI r1728547.  
Once the next version of POI is out and once we integrate that into Tika, this 
issue should be "fixed" at the Tika level.  The true fix would be to add 
parsing for that kind of record in POI...any takers?

> Convertion DOC->TXT failed due to POI issue
> -------------------------------------------
>
>                 Key: TIKA-1836
>                 URL: https://issues.apache.org/jira/browse/TIKA-1836
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.11
>         Environment: Distributor ID:  Ubuntu
> Description:  Ubuntu 12.04.5 LTS
> Release:      12.04
> Codename:     precise
> java version "1.7.0_91"
> OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.12.04.1)
> OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)
>            Reporter: Jorge Spinsanti
>         Attachments: test.doc
>
>
> When we try to convert DOC -> TXT, I got the next stack trace:
> {code}
> Caused by: org.apache.tika.exception.TikaException: Unexpected 
> RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@1ddeedb6
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       ... 15 more
> Caused by: java.lang.UnsupportedOperationException: Non-extended character 
> Pascal strings are not supported right now. Please, contact POI developers 
> for update.
>       at org.apache.poi.hwpf.model.Sttb.fillFields(Sttb.java:82)
>       at org.apache.poi.hwpf.model.Sttb.<init>(Sttb.java:61)
>       at 
> org.apache.poi.hwpf.model.SttbUtils.readSttbSavedBy(SttbUtils.java:52)
>       at org.apache.poi.hwpf.model.SavedByTable.<init>(SavedByTable.java:53)
>       at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:361)
>       at 
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       ... 22 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to