[ 
https://issues.apache.org/jira/browse/TIKA-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721512#comment-17721512
 ] 

Tim Allison commented on TIKA-4037:
-----------------------------------

We added detection for this file format.  

However, the file that was shared with me privately triggers commons-compress 
to identify this as a magic-less tar file.

As a complete fallback, if no magic is found in the file, commons compress 
tries to read the first record as if from a tar file and then checks the 
checksum.  In the file that was shared with me the first "entry" has a length 
of 0 so the checksum is correctly 0.  If we're able to share the triggering 
file, we may want to ask commons-compress if they'd be willing to make their 
detection a bit stricter and to ignore entries with length 0 when they confirm 
the checksum.

Within Tika, the problem is that the other detectors are run before the magic 
detector, and if the other detectors don't come up with {{octet-stream}} or a 
base type of what the magic detector finds, the magic detector is ignored.

We implicitly trust the other detectors and ignore the magic detection if an 
earlier detector has found something.  Not sure there's an easy improvement on 
the Tika side.

> Add detection for os2 bitmap array files
> ----------------------------------------
>
>                 Key: TIKA-4037
>                 URL: https://issues.apache.org/jira/browse/TIKA-4037
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Trivial
>             Fix For: 2.8.1
>
>
> http://fileformats.archiveteam.org/wiki/OS/2_Bitmap_Array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to