[
https://issues.apache.org/jira/browse/TIKA-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1515:
------------------------------
Summary: Old XLS 3 parsing is not working on some documents (was: Old XLS
3 parsing is not working)
> Old XLS 3 parsing is not working on some documents
> --------------------------------------------------
>
> Key: TIKA-1515
> URL: https://issues.apache.org/jira/browse/TIKA-1515
> Project: Tika
> Issue Type: Bug
> Reporter: Tim Allison
> Priority: Minor
>
> Thanks to [~gagravarr], we now have mime type id for excel.sheet.4 and
> excel.sheet.3, and we have parsing for excel.sheet.4. It looks like there's
> are two issues with excel.sheet.3 parsing on most excel.sheet.3 files in
> govdocs1.
> The predominant issue (169 out of 173) appears to stem from a bad/missing
> code page parse:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Unsupported codepage requested
> at
> org.apache.poi.hssf.record.OldStringRecord.getString(OldStringRecord.java:83)
> at
> org.apache.poi.hssf.record.OldLabelRecord.getValue(OldLabelRecord.java:82)
> at
> org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:159)
> at
> org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:82)
> at
> org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:76)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
> ... 41 more
> Caused by: java.io.UnsupportedEncodingException: Codepage number may not be
> -32767
> at
> org.apache.poi.util.CodePageUtil.codepageToEncoding(CodePageUtil.java:275)
> at
> org.apache.poi.util.CodePageUtil.codepageToEncoding(CodePageUtil.java:253)
> at
> org.apache.poi.util.CodePageUtil.getStringFromCodePage(CodePageUtil.java:231)
> at
> org.apache.poi.util.CodePageUtil.getStringFromCodePage(CodePageUtil.java:219)
> at
> org.apache.poi.hssf.record.OldStringRecord.getString(OldStringRecord.java:81)
> ... 46 more
> {noformat}
> The second issue only affects 4 documents.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)