[ https://issues.apache.org/jira/browse/PDFBOX-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated PDFBOX-4934: ------------------------------------ Attachment: PDFBOX-4934-Korea.pdf-1.png PDFBOX-4934-JP.pdf-1.png > Could not find referenced cmap stream Adobe-Japan1-XXXX > ------------------------------------------------------- > > Key: PDFBOX-4934 > URL: https://issues.apache.org/jira/browse/PDFBOX-4934 > Project: PDFBox > Issue Type: Bug > Components: FontBox > Affects Versions: 2.0.20 > Environment: Windows10, 64bit > Reporter: Shigeru Okada > Priority: Major > Attachments: JP.pdf, Korea.pdf, PDFBOX-4934-JP.pdf-1.png, > PDFBOX-4934-JP.pdf.txt, PDFBOX-4934-Korea.pdf-1.png, PDFBOX-4934-Korea.pdf.txt > > > The IOException exception occurs when attached pdf feeded into PDFBox. > The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap. > source code is as below. > --- > {code:java} > import javax.imageio.ImageIO; > import org.apache.commons.io.FileUtils; > import org.apache.pdfbox.pdmodel.PDDocument; > import org.apache.pdfbox.pdmodel.PDPage; > import org.apache.pdfbox.rendering.ImageType; > import org.apache.pdfbox.rendering.PDFRenderer; > import org.apache.pdfbox.text.PDFTextStripper; > import org.apache.pdfbox.text.TextPosition; > public class pdfBoxTest { > public static void main(String[] args) throws Exception { > pdfBoxTest sample = new pdfBoxTest(); > String pdfname = "D:/tmp/jp.pdf"; > File pdf = FileUtils.getFile(pdfname); > sample.extractTextFromPDF(pdf); > sample.load(pdf); > } > public void load(File pdf) throws Exception { > PDDocument document = PDDocument.load(pdf); > PDFRenderer renderer = new PDFRenderer(document); > BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, > ImageType.RGB); > ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg")); > } > } > {code} > getExternalCMap mehod in CMapParse.class tries to find external CMap, but > it couldn't find Japan1-65534 and throws exception. > I know that there is no such a CMap, but it is no problem to open this PDF > file, > so I think it is better not to throw exception and use another CMap. > I modified source code as below temporarily. it works well. > {code:java} > protected InputStream getExternalCMap(String name) throws IOException { > InputStream is = this.getClass().getResourceAsStream(name); > if(is == null) { > if(name.startsWith("Adobe-Japan1")) { > name = "Adobe-Japan1-1"; > } else if(name.startsWith("Adobe-Korea1")) { > name = "Adobe-Korea1-1"; > } > is = this.getClass().getResourceAsStream(name); > if(is == null) { > throw new IOException("Error: Could not find referenced cmap > stream " + name); > } > } > return is; > } > {code} > But it is not essential one. > If possiblećI would like to ask you to modify source code not to throw > exception if > it cannot find Cmap. > I found another Korean pdf file, it includes Adode-Korea1-3 Cmap. > Please refer to attached file. > Thanks! > //Okada -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org