[ https://issues.apache.org/jira/browse/PDFBOX-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237864#comment-13237864 ]
Andreas Lehmkühler commented on PDFBOX-1072: -------------------------------------------- The pdf contains JBIG2 encoded images which are not yet supported, see PDFBOX-1067 for details > PDFImageWriter extracts black images from arabic PDFs > ----------------------------------------------------- > > Key: PDFBOX-1072 > URL: https://issues.apache.org/jira/browse/PDFBOX-1072 > Project: PDFBox > Issue Type: Bug > Components: Utilities > Affects Versions: 1.6.0 > Reporter: Anton Stremoukhov > Labels: JBIG2 > Attachments: page9_thumbnail.png > > > When I tried to extract a JPEG image from arabic PDF, i've got a corrupted > file with black area which overlays all arabic text on each page. > In console i've got only this debug message and no other exceptions and so on: > DEBUG (PDPixelMap.java:241) - ColorModel: IndexColorModel: #pixelBits = 1 > numComponents = 4 color space = java.awt.color.ICC_ColorSpace@2eeb3c84 > transparency = 2 transIndex = 1 has alpha = true isAlphaPre = false > This is not only one pdf file. I have about 400-500 files which produces the > same thing. > Code: > PDFImageWriter writer = new PDFImageWriter(); > PDDocument document = PDDocument.load(sourceFile); > writer.writeImage(document, "jpg", "", 1, 1, filename); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira