Shigeru Okada created PDFBOX-4934:
-------------------------------------

             Summary: Could not find referenced cmap stream Adobe-Japan1-XXXX
                 Key: PDFBOX-4934
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4934
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox
    Affects Versions: 2.0.20
         Environment: Windows10, 64bit
            Reporter: Shigeru Okada
         Attachments: JP.pdf, Korea.pdf

The IOException exception occurs when attached pdf feeded into PDFBox.

The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap.
source code is as below.
---
import javax.imageio.ImageIO;

import org.apache.commons.io.FileUtils;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;

public class pdfBoxTest {
        public static void main(String[] args) throws Exception {
                pdfBoxTest sample = new pdfBoxTest();

                String pdfname = "D:/tmp/jp.pdf";
                File pdf = FileUtils.getFile(pdfname);

                sample.extractTextFromPDF(pdf);
                sample.load(pdf);
        }

        public void load(File pdf) throws Exception {

                PDDocument document = PDDocument.load(pdf);
                PDFRenderer renderer = new PDFRenderer(document);
                BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, 
ImageType.RGB);

                ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg"));
        }


getExternalCMap mehod in CMapParse.class tries to find external CMap, but
it couldn't find Japan1-65534 and throws exception.

I know that there is no such a CMap, but it is no problem to open PDF file,
so I think it is better not to throw exception and use another CMap.
I modified source code as below temporarily. it works well.

protected InputStream getExternalCMap(String name) throws IOException {
      InputStream is = this.getClass().getResourceAsStream(name);
       if(is == null) {
          if(name.startsWith("Adobe-Japan1")) {
             name = "Adobe-Japan1-1";
          } else if(name.startsWith("Adobe-Korea1")) {
             name = "Adobe-Korea1-1";
          }
          is = this.getClass().getResourceAsStream(name);
          if(is == null) {
             throw new IOException("Error: Could not find referenced cmap 
stream " + name);
          }  
      }

       return is;
 }

But it is not essential one.
If possible态I would like to ask you to modify source code not to throw 
exception if
it cannot find Cmap.

I found another Korean pdf file, it inclues Adode-Korea1-3 Cmap.

<<
/Supplement 3
/Registry (Adobe)
/Ordering (Korea1)
>>

Please refer to attached file.

Thanks!

//Okada





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to