[
https://issues.apache.org/jira/browse/TIKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923367#comment-17923367
]
Tim Allison commented on TIKA-4375:
-----------------------------------
Y, the bmp thing is weird... {{New BMP version not implemented yet.}} This zip
file has most of the bmps that caused problems:
{{commoncrawl3/JK/JKMFT7XDUF7VRB6WH4D6ECD6DE6MX32T}}. It is trivially
reproducible.
I'll take a look.
The json, I'm not as concerned with because we have a hard time detecting json
without a filename hint. The encoding difference (which I acknowledge is wrong)
comes in with the updated encoding detector. I don't like it, but I'm not sure
there's much we can do.
> Regression tests for 2.9.3 release
> ----------------------------------
>
> Key: TIKA-4375
> URL: https://issues.apache.org/jira/browse/TIKA-4375
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Attachments: 43R5U3BXJUDJXDZ25OAE33ZU47362WLV.zip,
> LTWA2JGVJGJ5RVKHTUX6SDS4NTL5UJVQ-p139.pdf, RYT4H6OCPKZPFG3YK5PGLETS6Q3SBUDV,
> reports-tika-2.9.3-rc1.tgz, tika-2.9.2-v-tika-2.9.3-reports.tgz
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)