[ 
https://issues.apache.org/jira/browse/TIKA-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suman Kashyap updated TIKA-1892:
--------------------------------
    Description: 
Our FHT analysis for mobipocket-ebook and shapefiles shows high corelation of 
initial header bytes. Further inspection of these files over online available 
and TREC polar data sets revealed presence of common bytes for mime 
identification 

patch content
<mime-type type="application/x-netcdf">
  <acronym>NETCDF</acronym>
  <_comment>Network Common Data Format</_comment>
  <magic priority="60">
      <match value="CDF" type="string" offset="0" />
  </magic>
  <glob pattern="*.nc"/>
</mime-type>
<mime-type type="application/x-mobipocket-ebook">
  <acronym>MOBI</acronym>
  <_comment>Mobipocket Ebook</_comment>
  <magic priority="60">
      <match value="BOOKMOBI" type="string" offset="23" />
  </magic>
  <glob pattern="*.mobi"/>
</mime-type>
<mime-type type="application/x-shapefile">
  <acronym>ESRI Shapefiles</acronym>
  <_comment>ESRI Shapefiles</_comment>
  <magic priority="60">
      <match value="0x0000270a" type="big32" offset="2" />
  </magic>
  <glob pattern="*.shp"/>
</mime-type>
 


  was:Our FHT analysis for mobipocket-ebook and shapefiles shows high 
corelation of initial header bytes. Further inspection of these files over 
online available and TREC polar data sets revealed presence of common bytes for 
mime identification 


> Mime Magic for application/x-mobipocket-ebook and application/x-shapefile
> -------------------------------------------------------------------------
>
>                 Key: TIKA-1892
>                 URL: https://issues.apache.org/jira/browse/TIKA-1892
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.12
>            Reporter: Suman Kashyap
>            Priority: Minor
>
> Our FHT analysis for mobipocket-ebook and shapefiles shows high corelation of 
> initial header bytes. Further inspection of these files over online available 
> and TREC polar data sets revealed presence of common bytes for mime 
> identification 
> patch content
> <mime-type type="application/x-netcdf">
>   <acronym>NETCDF</acronym>
>   <_comment>Network Common Data Format</_comment>
>   <magic priority="60">
>       <match value="CDF" type="string" offset="0" />
>   </magic>
>   <glob pattern="*.nc"/>
> </mime-type>
> <mime-type type="application/x-mobipocket-ebook">
>   <acronym>MOBI</acronym>
>   <_comment>Mobipocket Ebook</_comment>
>   <magic priority="60">
>       <match value="BOOKMOBI" type="string" offset="23" />
>   </magic>
>   <glob pattern="*.mobi"/>
> </mime-type>
> <mime-type type="application/x-shapefile">
>   <acronym>ESRI Shapefiles</acronym>
>   <_comment>ESRI Shapefiles</_comment>
>   <magic priority="60">
>       <match value="0x0000270a" type="big32" offset="2" />
>   </magic>
>   <glob pattern="*.shp"/>
> </mime-type>
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to