[ https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194435#comment-16194435 ]
Matthew Caruana Galizia commented on TIKA-2473: ----------------------------------------------- Magic: byte 0: x0A byte 1: either x00, 0x02, 0x03, 0x04 or 0x05 MIME type: image/vnd.zbrush.pcx via: https://www.iana.org/assignments/media-types/image/vnd.zbrush.pcx > PCX and DCX image support > ------------------------- > > Key: TIKA-2473 > URL: https://issues.apache.org/jira/browse/TIKA-2473 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.16 > Reporter: Matthew Caruana Galizia > > It's straightforward in theory to implement support for PCX and DCX. There's > support for it in Commons Imaging as well as in ImageIO via TwelveMonkeys. > In practise, however, I'm not really sure how implement support. We obviously > want to OCR the images, but Tesseract has no support for the format. So where > do we do the conversion to a BufferedImage? I tried to look for what is done > to handle JBIG2 files but I can't find that anywhere. -- This message was sent by Atlassian JIRA (v6.4.14#64029)