[
https://issues.apache.org/jira/browse/TIKA-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Johan van der Knijff updated TIKA-2468:
---------------------------------------
Description:
While running the tika detector on some old Quattro Pro for DOS spreadsheets, I
noticed these files are identified as "application/x-123" (Lotus 1-2-3). This
happens because the magic patterns for for "application/x-123" only covers
the first 4 bytes, which for one of them creates a collision with the Quattro
Pro for DOS magic pattern. I've created a patch which includes more specific
mimetype definitions and magic patterns for both Lotus 1-2-3 and Quattro Pro.
Patch is on its way!
[Pull request|https://github.com/apache/tika/pull/209/files]
was:While running the tika detector on some old Quattro Pro for DOS
spreadsheets, I noticed these files are identified as "application/x-123"
(Lotus 1-2-3). This happens because the magic patterns for for
"application/x-123" only covers the first 4 bytes, which for one of them
creates a collision with the Quattro Pro for DOS magic pattern. I've created a
patch which includes more specific mimetype definitions and magic patterns for
both Lotus 1-2-3 and Quattro Pro. Patch is on its way!
> Improved detection of Lotus 1-2-3 and Quattro Pro spreadsheets
> --------------------------------------------------------------
>
> Key: TIKA-2468
> URL: https://issues.apache.org/jira/browse/TIKA-2468
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 1.16
> Reporter: Johan van der Knijff
> Priority: Minor
>
> While running the tika detector on some old Quattro Pro for DOS spreadsheets,
> I noticed these files are identified as "application/x-123" (Lotus 1-2-3).
> This happens because the magic patterns for for "application/x-123" only
> covers the first 4 bytes, which for one of them creates a collision with the
> Quattro Pro for DOS magic pattern. I've created a patch which includes more
> specific mimetype definitions and magic patterns for both Lotus 1-2-3 and
> Quattro Pro. Patch is on its way!
> [Pull request|https://github.com/apache/tika/pull/209/files]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)