[
https://issues.apache.org/jira/browse/TIKA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330031#comment-14330031
]
Tyler Palsulich commented on TIKA-1460:
---------------------------------------
Hi [~onyas]. The dialog isn't in a very intuitive spot. It's under More >
Attach files. I found a PostScript version of the file under
{{/usr/share/fonts/cmap/}}. But, not a PDF. I'm also curious if a newer version
of Tika would solve your problem.
> Could not parse predefined CMAP file for 'Adobe-GBK1-UCS2'
> ----------------------------------------------------------
>
> Key: TIKA-1460
> URL: https://issues.apache.org/jira/browse/TIKA-1460
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.3
> Environment: win7,myeclipse8.5
> Reporter: onyas
> Priority: Critical
>
> for some reason,I could not upload the file,Here is the info..
> and i checked all the version in the directory of
> \org\apache\pdfbox\resources\cmap, I have not found the ’Adobe-GBK1-UCS2‘ file
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@d640af
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> Caused by: java.lang.IllegalArgumentException: Position 66048 past the end of
> the file
> at
> org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:50)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:420)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readBAT(NPOIFSFileSystem.java:397)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:356)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:202)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> ... 21 more
> the major code is :
> Parser parser = new AutoDetectParser();
> ContentHandler handler = new BodyContentHandler(getNum());
> Metadata metadata = new Metadata();
> ParseContext context = new ParseContext();
> InputStream stream = null;
> StringBuffer content = new StringBuffer();
> try {
> stream = new FileInputStream(file);
> if (stream != null) {
> parser.parse(stream, handler, metadata,
> context);
> content = content.append(handler);
>
> if(StringUtils.isNotBlank(content.toString())){
> hasContent = true;
> handler = null;
> metadata = null;
> context = null;
> }
> }
> And the exception is throwed at this line== parser.parse(stream, handler,
> metadata, context);
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)