[ https://issues.apache.org/jira/browse/TIKA-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Johan van der Knijff updated TIKA-2468: --------------------------------------- Description: While running the tika detector on some old Quattro Pro for DOS spreadsheets, I noticed these files are identified as "application/x-123" (Lotus 1-2-3). This happens because the magic patterns for for "application/x-123" only covers the first 4 bytes, which for one of them creates a collision with the Quattro Pro for DOS magic pattern. I've created a patch which includes more specific mimetype definitions and magic patterns for both Lotus 1-2-3 and Quattro Pro. Patch is on its way! [Pull request|https://github.com/apache/tika/pull/209/files] was:While running the tika detector on some old Quattro Pro for DOS spreadsheets, I noticed these files are identified as "application/x-123" (Lotus 1-2-3). This happens because the magic patterns for for "application/x-123" only covers the first 4 bytes, which for one of them creates a collision with the Quattro Pro for DOS magic pattern. I've created a patch which includes more specific mimetype definitions and magic patterns for both Lotus 1-2-3 and Quattro Pro. Patch is on its way! > Improved detection of Lotus 1-2-3 and Quattro Pro spreadsheets > -------------------------------------------------------------- > > Key: TIKA-2468 > URL: https://issues.apache.org/jira/browse/TIKA-2468 > Project: Tika > Issue Type: Improvement > Components: mime > Affects Versions: 1.16 > Reporter: Johan van der Knijff > Priority: Minor > > While running the tika detector on some old Quattro Pro for DOS spreadsheets, > I noticed these files are identified as "application/x-123" (Lotus 1-2-3). > This happens because the magic patterns for for "application/x-123" only > covers the first 4 bytes, which for one of them creates a collision with the > Quattro Pro for DOS magic pattern. I've created a patch which includes more > specific mimetype definitions and magic patterns for both Lotus 1-2-3 and > Quattro Pro. Patch is on its way! > [Pull request|https://github.com/apache/tika/pull/209/files] -- This message was sent by Atlassian JIRA (v6.4.14#64029)