[ 
https://issues.apache.org/jira/browse/TIKA-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970626#action_12970626
 ] 

Benjamin Douglas commented on TIKA-570:
---------------------------------------

What about this (from the Wikipedia article):

Offset: 0x1A
Size:2
Purpose: the number of color planes being used. Must be set to 1.

This means that there is always a two byte 0x01 0x00 sequence at a specific 
offset toward the beginning of the file. This is in the header, and granted 
there are different versions of the header; but the description in the article 
makes it look like the majority of headers have this, possibly modulo OS/2 
flavors. The pattern 0x01 0x00 is not likely to appear in most plain text, 
especially text that begins with ASCII. The BMP file in the unit tests has this 
signature, for example.

> If this is a BMP, my name is horatio alger
> ------------------------------------------
>
>                 Key: TIKA-570
>                 URL: https://issues.apache.org/jira/browse/TIKA-570
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.8
>            Reporter: Benson Margulies
>         Attachments: C80A5295-EFC7-44DD-9A39-B882D1EC6F38.txt, 
> C80A5295-EFC7-44DD-9A39-B882D1EC6F38.txt
>
>
> I am attaching a file which Tika is identifying as a bmp. It contains 
> ordinary text.
>  
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.image.imagepar...@20a19811
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
>       at com.basistech.jug.FileHarvester.process(FileHarvester.java:204)
>       at com.basistech.jug.FileHarvester.harvestDir(FileHarvester.java:165)
>       at com.basistech.jug.FileHarvester.harvestDir(FileHarvester.java:179)
>       at com.basistech.jug.FileHarvester.harvest(FileHarvester.java:135)
>       at com.basistech.jug.FileHarvester.run(FileHarvester.java:247)
>       at java.lang.Thread.run(Thread.java:680)
> Caused by: java.lang.RuntimeException: New BMP version not implemented yet.
>       at 
> com.sun.imageio.plugins.bmp.BMPImageReader.readHeader(BMPImageReader.java:462)
>       at 
> com.sun.imageio.plugins.bmp.BMPImageReader.getWidth(BMPImageReader.java:174)
>       at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:75)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
>       ... 8 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to