[
https://issues.apache.org/jira/browse/TIKA-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-3639.
-------------------------------
Fix Version/s: 2.2.2
Resolution: Fixed
> NullPointerException throws when parsing zip file
> --------------------------------------------------
>
> Key: TIKA-3639
> URL: https://issues.apache.org/jira/browse/TIKA-3639
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.2.0, 2.2.1
> Reporter: Kaka Lee
> Assignee: Tim Allison
> Priority: Blocker
> Fix For: 2.2.2
>
> Attachments: 123.zip, IWORKDocumentType.png, detectype.png,
> exception.png
>
>
> Always throws a NullPointerException when detect zip file, it can be
> reproduced through the following steps.
> # Create a zip file with a index.xml, the xml is simple
> {code:java}
> <?xml version='1.0' encoding='UTF-8' ?>
> <index>
> </index> {code}
>
> # add dependency to pom.xml, the *Key* dependency ** is
> *tika-parser-apple-module*
> {code:java}
> <dependencies>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-core</artifactId>
> <version>2.2.1</version>
> </dependency>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-parsers</artifactId>
> <type>pom</type>
> <version>2.2.1</version>
> </dependency>
> <dependency>
> <groupId>org.apache.tika</groupId>
> <artifactId>tika-parser-apple-module</artifactId>
> <version>2.2.1</version>
> </dependency> {code}
> # using tika.detect to parse zip file, it will throws a NPE
> {code:java}
> String filePath = "123.zip";
> Tika tika = new Tika();
> type = tika.detect(new FileInputStream(new File(filePath)));{code}
> Notice that when using tika.detect(String name), it‘s normal and return
> "application/zip", the NPE situation only occur when using
> tika.detect(InputStream stream)。
>
> It seems when tika parse a zip file through {*}IWorkPackageParser{*}, tika
> will parsing index.xml, it will parse '.Number', '.key', '.pages',
> 'encrypted' file using below class in xml, when Number, key, pages are all
> empty, the encrypted's namespace is null, then in the for-loop it will throws
> a NPE.
> the source code below:
> {code:java}
> KEYNOTE("http://developer.apple.com/namespaces/keynote2", "presentation",
> MediaType.application("vnd.apple.keynote")),
> NUMBERS("http://developer.apple.com/namespaces/ls", "document",
> MediaType.application("vnd.apple.numbers")),
> PAGES("http://developer.apple.com/namespaces/sl", "document",
> MediaType.application("vnd.apple.pages")),
> ENCRYPTED(null, null, MediaType.application("x-tika-iworks-protected"));
> {code}
> {code:java}
> public static IWORKDocumentType detectType(InputStream stream) {
> QName qname = new XmlRootExtractor().extractRootElement(stream);
> if (qname != null) {
> String uri = qname.getNamespaceURI();
> String local = qname.getLocalPart();
> for (IWORKDocumentType type : values()) {
> if (type.getNamespace().equals(uri) && type.getPart().equals(local)) {
> return type;
> }
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)