[
https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283780#comment-16283780
]
Tim Allison edited comment on TIKA-2483 at 12/8/17 4:26 PM:
------------------------------------------------------------
Regression tests in prep for 1.17 found a blocker caused by a botched fix to
this issue. We need to add quite a few more specializations of zip and tar to
check for and avoid overwriting of mime types to zip. Lots of files that were
identified as kmz, tika-ooxml, etc in 1.16 are now being identified as "zip"
during the parse in 1.17-SNAPSHOT.
Current patch includes list semi-manually, which I abhor, but I added a test to
make sure that PackageParser's list of specialization stays current with
TikaConfig's default config.
After 1.17 is released, we can either work towards getting rid of serialization
of parsers in ForkParser and/or making TikaConfig serializable. Until we do
that, I don't see an elegant solution.
was (Author: [email protected]):
Regression tests in prep for 1.17 show that we need to add quite a few more
specializations of zip and tar to check for and avoid overwriting of mime types
to zip. Lots of files that were identified as kmz, tika-ooxml, etc in 1.16 are
now being identified as "zip" during the parse in 1.17-SNAPSHOT.
Current patch includes list semi-manually, which I abhor, but I added a test to
make sure that PackageParser's list of specialization stays current with
TikaConfig's default config.
After 1.17 is released, we can either work towards getting rid of serialization
of parsers in ForkParser and/or making TikaConfig serializable. Until we do
that, I don't see an elegant solution.
> Using PackageParser in ForkParser causes NPE
> --------------------------------------------
>
> Key: TIKA-2483
> URL: https://issues.apache.org/jira/browse/TIKA-2483
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.16
> Reporter: TzeKai Lee
> Attachments: testForkedPackageParsing.patch
>
>
> {quote}
> Caused by: java.lang.NullPointerException
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:158)
> at
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
> at
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:78)
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:242)
> at
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:379)
> at
> org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:165)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> {quote}
> The mediaTypeRegistry handling code in parse() of PackageParser seems cause
> the problem due to ForkParser cannot properly construct default TikaConfig.
> Also since TikaConfig is not serializable, there is no way to assign
> mediaTypeRegistry/bufferedMediaTypeRegistry before calling parse()
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)