[ 
https://issues.apache.org/jira/browse/TIKA-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471937#comment-17471937
 ] 

Tim Allison commented on TIKA-3639:
-----------------------------------

Thank you for opening this and submitting a triggering file. This is a bug that 
is now fixed in 2.2.2-SNAPSHOT.

 

This happens when you use the iworks detector by itself (e.g. if you are only 
using the tika-parser-apple-module). 

 

In Tika 2.x, instead of importing tika-parsers as you did in Tika 1.x, if you 
want to import "most of the basic parsers", you should import 
tika-parsers-standard-package (and maybe tika-parsers-sqlite3-module).  See 
[https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0.]

 

Thank you, again, and please let us know if you have any questions.

> NullPointerException  throws when parsing zip file
> --------------------------------------------------
>
>                 Key: TIKA-3639
>                 URL: https://issues.apache.org/jira/browse/TIKA-3639
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.2.0, 2.2.1
>            Reporter: Kaka Lee
>            Assignee: Tim Allison
>            Priority: Blocker
>         Attachments: 123.zip, IWORKDocumentType.png, detectype.png, 
> exception.png
>
>
> Always throws a NullPointerException when detect zip file, it can be 
> reproduced through the following steps.
>  # Create a zip file with a index.xml, the xml is simple
> {code:java}
> <?xml version='1.0' encoding='UTF-8' ?>
> <index>
> </index> {code}
>  
>  # add dependency to pom.xml, the *Key*  dependency ** is 
> *tika-parser-apple-module* 
> {code:java}
> <dependencies>
>         <dependency>
>             <groupId>org.apache.tika</groupId>
>             <artifactId>tika-core</artifactId>
>             <version>2.2.1</version>
>         </dependency>        
>             <dependency>
>             <groupId>org.apache.tika</groupId>
>             <artifactId>tika-parsers</artifactId>
>             <type>pom</type>
>             <version>2.2.1</version>
>         </dependency>        
>             <dependency>
>             <groupId>org.apache.tika</groupId>
>             <artifactId>tika-parser-apple-module</artifactId>
>             <version>2.2.1</version>
>         </dependency> {code}
>  # using tika.detect to parse zip file, it will throws a NPE
> {code:java}
> String filePath = "123.zip";
> Tika tika = new Tika(); 
> type = tika.detect(new FileInputStream(new File(filePath)));{code}
>  Notice that when using tika.detect(String name), it‘s normal and return 
> "application/zip",  the NPE situation only occur  when using 
> tika.detect(InputStream stream)。
>  
> It seems when tika parse a zip file through {*}IWorkPackageParser{*},  tika 
> will parsing index.xml, it will parse '.Number', '.key', '.pages', 
> 'encrypted' file using below class in xml, when Number, key, pages are all 
> empty, the encrypted's namespace is null, then in the for-loop it will throws 
> a NPE.
> the source code below:
> {code:java}
> KEYNOTE("http://developer.apple.com/namespaces/keynote2";, "presentation",
>                 MediaType.application("vnd.apple.keynote")),
> NUMBERS("http://developer.apple.com/namespaces/ls";, "document",
>                 MediaType.application("vnd.apple.numbers")),
> PAGES("http://developer.apple.com/namespaces/sl";, "document",
>                 MediaType.application("vnd.apple.pages")),
> ENCRYPTED(null, null, MediaType.application("x-tika-iworks-protected")); 
> {code}
> {code:java}
> public static IWORKDocumentType detectType(InputStream stream) {  
>    QName qname = new XmlRootExtractor().extractRootElement(stream);      
>    if (qname != null) {                
>     String uri = qname.getNamespaceURI();                 
>     String local = qname.getLocalPart();                     
>     for (IWORKDocumentType type : values()) {                     
>     if (type.getNamespace().equals(uri) && type.getPart().equals(local)) {    
>         return type;                     
>     }              
>    } 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to