[ 
https://issues.apache.org/jira/browse/PDFBOX-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shigeru Okada updated PDFBOX-4934:
----------------------------------
    Description: 
The IOException exception occurs when attached pdf feeded into PDFBox.

The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap.
source code is as below.
---
import javax.imageio.ImageIO;

import org.apache.commons.io.FileUtils;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;

public class pdfBoxTest {
        public static void main(String[] args) throws Exception {
                pdfBoxTest sample = new pdfBoxTest();

                String pdfname = "D:/tmp/jp.pdf";
                File pdf = FileUtils.getFile(pdfname);

                sample.extractTextFromPDF(pdf);
                sample.load(pdf);
        }

        public void load(File pdf) throws Exception {

                PDDocument document = PDDocument.load(pdf);
                PDFRenderer renderer = new PDFRenderer(document);
                BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, 
ImageType.RGB);

                ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg"));
        }

-----

getExternalCMap mehod in CMapParse.class tries to find external CMap, but
it couldn't find Japan1-65534 and throws exception.

I know that there is no such a CMap, but it is no problem to open PDF file,
so I think it is better not to throw exception and use another CMap.
I modified source code as below temporarily. it works well.

protected InputStream getExternalCMap(String name) throws IOException {
      InputStream is = this.getClass().getResourceAsStream(name);
       if(is == null) {
          if(name.startsWith("Adobe-Japan1")) {
             name = "Adobe-Japan1-1";
          } else if(name.startsWith("Adobe-Korea1")) {
             name = "Adobe-Korea1-1";
          }
          is = this.getClass().getResourceAsStream(name);
          if(is == null) {
             throw new IOException("Error: Could not find referenced cmap 
stream " + name);
          }  
      }

       return is;
 }

But it is not essential one.
If possible、I would like to ask you to modify source code not to throw 
exception if
it cannot find Cmap.

I found another Korean pdf file, it inclues Adode-Korea1-3 Cmap.

<<
/Supplement 3
/Registry (Adobe)
/Ordering (Korea1)
>>

Please refer to attached file.

Thanks!

//Okada



  was:
The IOException exception occurs when attached pdf feeded into PDFBox.

The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap.
source code is as below.
---
import javax.imageio.ImageIO;

import org.apache.commons.io.FileUtils;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;

public class pdfBoxTest {
        public static void main(String[] args) throws Exception {
                pdfBoxTest sample = new pdfBoxTest();

                String pdfname = "D:/tmp/jp.pdf";
                File pdf = FileUtils.getFile(pdfname);

                sample.extractTextFromPDF(pdf);
                sample.load(pdf);
        }

        public void load(File pdf) throws Exception {

                PDDocument document = PDDocument.load(pdf);
                PDFRenderer renderer = new PDFRenderer(document);
                BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, 
ImageType.RGB);

                ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg"));
        }


getExternalCMap mehod in CMapParse.class tries to find external CMap, but
it couldn't find Japan1-65534 and throws exception.

I know that there is no such a CMap, but it is no problem to open PDF file,
so I think it is better not to throw exception and use another CMap.
I modified source code as below temporarily. it works well.

protected InputStream getExternalCMap(String name) throws IOException {
      InputStream is = this.getClass().getResourceAsStream(name);
       if(is == null) {
          if(name.startsWith("Adobe-Japan1")) {
             name = "Adobe-Japan1-1";
          } else if(name.startsWith("Adobe-Korea1")) {
             name = "Adobe-Korea1-1";
          }
          is = this.getClass().getResourceAsStream(name);
          if(is == null) {
             throw new IOException("Error: Could not find referenced cmap 
stream " + name);
          }  
      }

       return is;
 }

But it is not essential one.
If possible、I would like to ask you to modify source code not to throw 
exception if
it cannot find Cmap.

I found another Korean pdf file, it inclues Adode-Korea1-3 Cmap.

<<
/Supplement 3
/Registry (Adobe)
/Ordering (Korea1)
>>

Please refer to attached file.

Thanks!

//Okada




> Could not find referenced cmap stream Adobe-Japan1-XXXX
> -------------------------------------------------------
>
>                 Key: PDFBOX-4934
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4934
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.20
>         Environment: Windows10, 64bit
>            Reporter: Shigeru Okada
>            Priority: Major
>         Attachments: JP.pdf, Korea.pdf
>
>
> The IOException exception occurs when attached pdf feeded into PDFBox.
> The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap.
> source code is as below.
> ---
> import javax.imageio.ImageIO;
> import org.apache.commons.io.FileUtils;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDPage;
> import org.apache.pdfbox.rendering.ImageType;
> import org.apache.pdfbox.rendering.PDFRenderer;
> import org.apache.pdfbox.text.PDFTextStripper;
> import org.apache.pdfbox.text.TextPosition;
> public class pdfBoxTest {
>       public static void main(String[] args) throws Exception {
>               pdfBoxTest sample = new pdfBoxTest();
>               String pdfname = "D:/tmp/jp.pdf";
>               File pdf = FileUtils.getFile(pdfname);
>               sample.extractTextFromPDF(pdf);
>               sample.load(pdf);
>       }
>       public void load(File pdf) throws Exception {
>               PDDocument document = PDDocument.load(pdf);
>               PDFRenderer renderer = new PDFRenderer(document);
>               BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, 
> ImageType.RGB);
>               ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg"));
>       }
> -----
> getExternalCMap mehod in CMapParse.class tries to find external CMap, but
> it couldn't find Japan1-65534 and throws exception.
> I know that there is no such a CMap, but it is no problem to open PDF file,
> so I think it is better not to throw exception and use another CMap.
> I modified source code as below temporarily. it works well.
> protected InputStream getExternalCMap(String name) throws IOException {
>       InputStream is = this.getClass().getResourceAsStream(name);
>        if(is == null) {
>           if(name.startsWith("Adobe-Japan1")) {
>              name = "Adobe-Japan1-1";
>           } else if(name.startsWith("Adobe-Korea1")) {
>              name = "Adobe-Korea1-1";
>           }
>           is = this.getClass().getResourceAsStream(name);
>           if(is == null) {
>              throw new IOException("Error: Could not find referenced cmap 
> stream " + name);
>           }  
>       }
>        return is;
>  }
> But it is not essential one.
> If possible、I would like to ask you to modify source code not to throw 
> exception if
> it cannot find Cmap.
> I found another Korean pdf file, it inclues Adode-Korea1-3 Cmap.
> <<
> /Supplement 3
> /Registry (Adobe)
> /Ordering (Korea1)
> >>
> Please refer to attached file.
> Thanks!
> //Okada



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to