[ 
https://issues.apache.org/jira/browse/TIKA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

onyas updated TIKA-1460:
------------------------
    Description: 
for some reason,I could not upload the file,Here is the info..
and i checked all the version in the directory of 
\org\apache\pdfbox\resources\cmap, I have not found the ’Adobe-GBK1-UCS2‘ file

org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@d640af
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)

Caused by: java.lang.IllegalArgumentException: Position 66048 past the end of 
the file
        at 
org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:50)
        at 
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:420)
        at 
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readBAT(NPOIFSFileSystem.java:397)
        at 
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:356)
        at 
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:202)
        at 
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        ... 21 more


the major code is :

                Parser parser = new AutoDetectParser();
                ContentHandler handler = new BodyContentHandler(getNum());
                Metadata metadata = new Metadata();
                ParseContext context = new ParseContext();
                InputStream stream = null;
                StringBuffer content = new StringBuffer();
                try {
                        stream = new FileInputStream(file);
                        if (stream != null) {
                                parser.parse(stream, handler, metadata, 
context);
                                content = content.append(handler);
                                
                                if(StringUtils.isNotBlank(content.toString())){
                                        hasContent = true;
                                        handler = null;
                                        metadata = null;
                                        context = null;
                                }
                        }

And the exception is throwed at this line== parser.parse(stream, handler, 
metadata, context);

  was:
for some reason,I could not upload the file,Here is the info..
and i checked all the version in the directory of 
\org\apache\pdfbox\resources\cmap, I have not found the ’Adobe-GBK1-UCS2‘ file

org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@d640af
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)

Caused by: java.lang.IllegalArgumentException: Position 66048 past the end of 
the file
        at 
org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:50)
        at 
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:420)
        at 
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readBAT(NPOIFSFileSystem.java:397)
        at 
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:356)
        at 
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:202)
        at 
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        ... 21 more


> Could not parse predefined CMAP file for 'Adobe-GBK1-UCS2'
> ----------------------------------------------------------
>
>                 Key: TIKA-1460
>                 URL: https://issues.apache.org/jira/browse/TIKA-1460
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.3
>         Environment: win7,myeclipse8.5
>            Reporter: onyas
>            Priority: Critical
>
> for some reason,I could not upload the file,Here is the info..
> and i checked all the version in the directory of 
> \org\apache\pdfbox\resources\cmap, I have not found the ’Adobe-GBK1-UCS2‘ file
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@d640af
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> Caused by: java.lang.IllegalArgumentException: Position 66048 past the end of 
> the file
>       at 
> org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:50)
>       at 
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:420)
>       at 
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readBAT(NPOIFSFileSystem.java:397)
>       at 
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:356)
>       at 
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:202)
>       at 
> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       ... 21 more
> the major code is :
>                 Parser parser = new AutoDetectParser();
>               ContentHandler handler = new BodyContentHandler(getNum());
>               Metadata metadata = new Metadata();
>               ParseContext context = new ParseContext();
>               InputStream stream = null;
>               StringBuffer content = new StringBuffer();
>               try {
>                       stream = new FileInputStream(file);
>                       if (stream != null) {
>                               parser.parse(stream, handler, metadata, 
> context);
>                               content = content.append(handler);
>                               
>                               if(StringUtils.isNotBlank(content.toString())){
>                                       hasContent = true;
>                                       handler = null;
>                                       metadata = null;
>                                       context = null;
>                               }
>                       }
> And the exception is throwed at this line== parser.parse(stream, handler, 
> metadata, context);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to