Shigeru Okada created PDFBOX-4934: ------------------------------------- Summary: Could not find referenced cmap stream Adobe-Japan1-XXXX Key: PDFBOX-4934 URL: https://issues.apache.org/jira/browse/PDFBOX-4934 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.20 Environment: Windows10, 64bit Reporter: Shigeru Okada Attachments: JP.pdf, Korea.pdf
The IOException exception occurs when attached pdf feeded into PDFBox. The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap. source code is as below. --- import javax.imageio.ImageIO; import org.apache.commons.io.FileUtils; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.rendering.ImageType; import org.apache.pdfbox.rendering.PDFRenderer; import org.apache.pdfbox.text.PDFTextStripper; import org.apache.pdfbox.text.TextPosition; public class pdfBoxTest { public static void main(String[] args) throws Exception { pdfBoxTest sample = new pdfBoxTest(); String pdfname = "D:/tmp/jp.pdf"; File pdf = FileUtils.getFile(pdfname); sample.extractTextFromPDF(pdf); sample.load(pdf); } public void load(File pdf) throws Exception { PDDocument document = PDDocument.load(pdf); PDFRenderer renderer = new PDFRenderer(document); BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, ImageType.RGB); ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg")); } getExternalCMap mehod in CMapParse.class tries to find external CMap, but it couldn't find Japan1-65534 and throws exception. I know that there is no such a CMap, but it is no problem to open PDF file, so I think it is better not to throw exception and use another CMap. I modified source code as below temporarily. it works well. protected InputStream getExternalCMap(String name) throws IOException { InputStream is = this.getClass().getResourceAsStream(name); if(is == null) { if(name.startsWith("Adobe-Japan1")) { name = "Adobe-Japan1-1"; } else if(name.startsWith("Adobe-Korea1")) { name = "Adobe-Korea1-1"; } is = this.getClass().getResourceAsStream(name); if(is == null) { throw new IOException("Error: Could not find referenced cmap stream " + name); } } return is; } But it is not essential one. If possiblećI would like to ask you to modify source code not to throw exception if it cannot find Cmap. I found another Korean pdf file, it inclues Adode-Korea1-3 Cmap. << /Supplement 3 /Registry (Adobe) /Ordering (Korea1) >> Please refer to attached file. Thanks! //Okada -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org