[
https://issues.apache.org/jira/browse/TIKA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
onyas updated TIKA-1460:
------------------------
Description:
for some reason,I could not upload the file,Here is the info..
and i checked all the version in the directory of
\org\apache\pdfbox\resources\cmap, I have not found the ’Adobe-GBK1-UCS2‘ file
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@d640af
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
Caused by: java.lang.IllegalArgumentException: Position 66048 past the end of
the file
at
org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:50)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:420)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readBAT(NPOIFSFileSystem.java:397)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:356)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:202)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 21 more
the major code is :
Parser parser = new AutoDetectParser();
ContentHandler handler = new BodyContentHandler(getNum());
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
InputStream stream = null;
StringBuffer content = new StringBuffer();
try {
stream = new FileInputStream(file);
if (stream != null) {
parser.parse(stream, handler, metadata,
context);
content = content.append(handler);
if(StringUtils.isNotBlank(content.toString())){
hasContent = true;
handler = null;
metadata = null;
context = null;
}
}
And the exception is throwed at this line== parser.parse(stream, handler,
metadata, context);
was:
for some reason,I could not upload the file,Here is the info..
and i checked all the version in the directory of
\org\apache\pdfbox\resources\cmap, I have not found the ’Adobe-GBK1-UCS2‘ file
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@d640af
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
Caused by: java.lang.IllegalArgumentException: Position 66048 past the end of
the file
at
org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:50)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:420)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readBAT(NPOIFSFileSystem.java:397)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:356)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:202)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 21 more
> Could not parse predefined CMAP file for 'Adobe-GBK1-UCS2'
> ----------------------------------------------------------
>
> Key: TIKA-1460
> URL: https://issues.apache.org/jira/browse/TIKA-1460
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.3
> Environment: win7,myeclipse8.5
> Reporter: onyas
> Priority: Critical
>
> for some reason,I could not upload the file,Here is the info..
> and i checked all the version in the directory of
> \org\apache\pdfbox\resources\cmap, I have not found the ’Adobe-GBK1-UCS2‘ file
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@d640af
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> Caused by: java.lang.IllegalArgumentException: Position 66048 past the end of
> the file
> at
> org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:50)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:420)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readBAT(NPOIFSFileSystem.java:397)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:356)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:202)
> at
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> ... 21 more
> the major code is :
> Parser parser = new AutoDetectParser();
> ContentHandler handler = new BodyContentHandler(getNum());
> Metadata metadata = new Metadata();
> ParseContext context = new ParseContext();
> InputStream stream = null;
> StringBuffer content = new StringBuffer();
> try {
> stream = new FileInputStream(file);
> if (stream != null) {
> parser.parse(stream, handler, metadata,
> context);
> content = content.append(handler);
>
> if(StringUtils.isNotBlank(content.toString())){
> hasContent = true;
> handler = null;
> metadata = null;
> context = null;
> }
> }
> And the exception is throwed at this line== parser.parse(stream, handler,
> metadata, context);
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)