Andreas Meier created TIKA-2574:

             Summary: Extend PCX detection in tika-mimetypes.xml
                 Key: TIKA-2574
             Project: Tika
          Issue Type: Sub-task
          Components: detector
    Affects Versions: 1.17
            Reporter: Andreas Meier
         Attachments: IUC10-da-Q.UTF-16LE.without-BOM, 
IUC10-da-Q.UTF-32LE.without-BOM, IUC10-da.UTF-16LE.without-BOM, 
IUC10-it.UTF-16LE.without-BOM, Test.pcx, Test_without_filehandle

The matcher for pcx should be reworked to avoid false-positives upon UTF-16LE 
and UTF-32LE textfiles.

I suggest adding the filler from the header as mentioned in the original [pcx 

<mime-type type="image/vnd.zbrush.pcx">
  <_comment>ZSoft Paintbrush PiCture eXchange</_comment>
  <alias type="image/x-pcx"/>
  <alias type="image/x-pc-paintbrush"/>
  <magic priority="40">
  <match value="0x0A" type="string" offset="0">
    <!-- bytes 74 to 128 are blank to fill out 128 byte header. Set all bytes 
to 0 -->
    <!-- This has to be set to avoid false positives for 
text/plain;charset=UTF-16LE and text/plain;charset=UTF-32LE -->
 type="string" offset="74">
      <match value="0x00" type="string" offset="1"/>
      <match value="0x02" type="string" offset="1"/>
      <match value="0x03" type="string" offset="1"/>
      <match value="0x04" type="string" offset="1"/>
      <match value="0x05" type="string" offset="1"/>

<glob pattern="*.pcx"/>

I added some testfiles.

[~gagravarr] Can you please check this?

This message was sent by Atlassian JIRA

Reply via email to