[ https://issues.apache.org/jira/browse/PDFBOX-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shigeru Okada updated PDFBOX-4934: ---------------------------------- Description: The IOException exception occurs when attached pdf feeded into PDFBox. The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap. source code is as below. --- import javax.imageio.ImageIO; import org.apache.commons.io.FileUtils; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.rendering.ImageType; import org.apache.pdfbox.rendering.PDFRenderer; import org.apache.pdfbox.text.PDFTextStripper; import org.apache.pdfbox.text.TextPosition; public class pdfBoxTest { public static void main(String[] args) throws Exception { pdfBoxTest sample = new pdfBoxTest(); String pdfname = "D:/tmp/jp.pdf"; File pdf = FileUtils.getFile(pdfname); sample.extractTextFromPDF(pdf); sample.load(pdf); } public void load(File pdf) throws Exception { PDDocument document = PDDocument.load(pdf); PDFRenderer renderer = new PDFRenderer(document); BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, ImageType.RGB); ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg")); } ----- getExternalCMap mehod in CMapParse.class tries to find external CMap, but it couldn't find Japan1-65534 and throws exception. I know that there is no such a CMap, but it is no problem to open PDF file, so I think it is better not to throw exception and use another CMap. I modified source code as below temporarily. it works well. protected InputStream getExternalCMap(String name) throws IOException { InputStream is = this.getClass().getResourceAsStream(name); if(is == null) { if(name.startsWith("Adobe-Japan1")) { name = "Adobe-Japan1-1"; } else if(name.startsWith("Adobe-Korea1")) { name = "Adobe-Korea1-1"; } is = this.getClass().getResourceAsStream(name); if(is == null) { throw new IOException("Error: Could not find referenced cmap stream " + name); } } return is; } But it is not essential one. If possible、I would like to ask you to modify source code not to throw exception if it cannot find Cmap. I found another Korean pdf file, it inclues Adode-Korea1-3 Cmap. << /Supplement 3 /Registry (Adobe) /Ordering (Korea1) >> Please refer to attached file. Thanks! //Okada was: The IOException exception occurs when attached pdf feeded into PDFBox. The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap. source code is as below. --- import javax.imageio.ImageIO; import org.apache.commons.io.FileUtils; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.rendering.ImageType; import org.apache.pdfbox.rendering.PDFRenderer; import org.apache.pdfbox.text.PDFTextStripper; import org.apache.pdfbox.text.TextPosition; public class pdfBoxTest { public static void main(String[] args) throws Exception { pdfBoxTest sample = new pdfBoxTest(); String pdfname = "D:/tmp/jp.pdf"; File pdf = FileUtils.getFile(pdfname); sample.extractTextFromPDF(pdf); sample.load(pdf); } public void load(File pdf) throws Exception { PDDocument document = PDDocument.load(pdf); PDFRenderer renderer = new PDFRenderer(document); BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, ImageType.RGB); ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg")); } getExternalCMap mehod in CMapParse.class tries to find external CMap, but it couldn't find Japan1-65534 and throws exception. I know that there is no such a CMap, but it is no problem to open PDF file, so I think it is better not to throw exception and use another CMap. I modified source code as below temporarily. it works well. protected InputStream getExternalCMap(String name) throws IOException { InputStream is = this.getClass().getResourceAsStream(name); if(is == null) { if(name.startsWith("Adobe-Japan1")) { name = "Adobe-Japan1-1"; } else if(name.startsWith("Adobe-Korea1")) { name = "Adobe-Korea1-1"; } is = this.getClass().getResourceAsStream(name); if(is == null) { throw new IOException("Error: Could not find referenced cmap stream " + name); } } return is; } But it is not essential one. If possible、I would like to ask you to modify source code not to throw exception if it cannot find Cmap. I found another Korean pdf file, it inclues Adode-Korea1-3 Cmap. << /Supplement 3 /Registry (Adobe) /Ordering (Korea1) >> Please refer to attached file. Thanks! //Okada > Could not find referenced cmap stream Adobe-Japan1-XXXX > ------------------------------------------------------- > > Key: PDFBOX-4934 > URL: https://issues.apache.org/jira/browse/PDFBOX-4934 > Project: PDFBox > Issue Type: Bug > Components: FontBox > Affects Versions: 2.0.20 > Environment: Windows10, 64bit > Reporter: Shigeru Okada > Priority: Major > Attachments: JP.pdf, Korea.pdf > > > The IOException exception occurs when attached pdf feeded into PDFBox. > The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap. > source code is as below. > --- > import javax.imageio.ImageIO; > import org.apache.commons.io.FileUtils; > import org.apache.pdfbox.pdmodel.PDDocument; > import org.apache.pdfbox.pdmodel.PDPage; > import org.apache.pdfbox.rendering.ImageType; > import org.apache.pdfbox.rendering.PDFRenderer; > import org.apache.pdfbox.text.PDFTextStripper; > import org.apache.pdfbox.text.TextPosition; > public class pdfBoxTest { > public static void main(String[] args) throws Exception { > pdfBoxTest sample = new pdfBoxTest(); > String pdfname = "D:/tmp/jp.pdf"; > File pdf = FileUtils.getFile(pdfname); > sample.extractTextFromPDF(pdf); > sample.load(pdf); > } > public void load(File pdf) throws Exception { > PDDocument document = PDDocument.load(pdf); > PDFRenderer renderer = new PDFRenderer(document); > BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, > ImageType.RGB); > ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg")); > } > ----- > getExternalCMap mehod in CMapParse.class tries to find external CMap, but > it couldn't find Japan1-65534 and throws exception. > I know that there is no such a CMap, but it is no problem to open PDF file, > so I think it is better not to throw exception and use another CMap. > I modified source code as below temporarily. it works well. > protected InputStream getExternalCMap(String name) throws IOException { > InputStream is = this.getClass().getResourceAsStream(name); > if(is == null) { > if(name.startsWith("Adobe-Japan1")) { > name = "Adobe-Japan1-1"; > } else if(name.startsWith("Adobe-Korea1")) { > name = "Adobe-Korea1-1"; > } > is = this.getClass().getResourceAsStream(name); > if(is == null) { > throw new IOException("Error: Could not find referenced cmap > stream " + name); > } > } > return is; > } > But it is not essential one. > If possible、I would like to ask you to modify source code not to throw > exception if > it cannot find Cmap. > I found another Korean pdf file, it inclues Adode-Korea1-3 Cmap. > << > /Supplement 3 > /Registry (Adobe) > /Ordering (Korea1) > >> > Please refer to attached file. > Thanks! > //Okada -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org