[ https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195579#comment-16195579 ]
Nick Burch commented on TIKA-2473: ---------------------------------- I've added some test files, mime magic and detection. The magic for PCX I've had to set at a slightly lower priority to avoid some false positives I'll leave someone else to answer on the parser front! > PCX and DCX image support > ------------------------- > > Key: TIKA-2473 > URL: https://issues.apache.org/jira/browse/TIKA-2473 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.16 > Reporter: Matthew Caruana Galizia > > It's straightforward in theory to implement support for PCX and DCX. There's > support for it in Commons Imaging as well as in ImageIO via TwelveMonkeys. > In practise, however, I'm not really sure how implement support. We obviously > want to OCR the images, but Tesseract has no support for the format. So where > do we do the conversion to a BufferedImage? I tried to look for what is done > to handle JBIG2 files but I can't find that anywhere. -- This message was sent by Atlassian JIRA (v6.4.14#64029)