Julien Ortega created PDFBOX-2740:
-------------------------------------

             Summary: Text extraction failed on Korean PDF
                 Key: PDFBOX-2740
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2740
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 1.8.9, 1.8.8, 1.8.7, 2.0.0
            Reporter: Julien Ortega


Trying to extract text on a Korean PDF gives me a lot of warnings :
WARNING: No Unicode mapping for US (33) in font 
DVCAYA+WtKoBaeumMyungjoL063zb4?Pw
avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for NAK (33) in font 
JYLDGG+WtKoBaeumMyungjoL053zb4?Pw
avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for RS (38) in font 
WRYULE+WtKoBaeumMyungjoL013zb4?Pw
avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font FZEFOY+WtKoBaeumGothicL0422b4?Pw
avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for DEL (33) in font 
FZEFOY+WtKoBaeumGothicL0422b4?Pw
avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font OOLNBG+WtKoBaeumGothicL0122b4?Pw
avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for SOH (33) in font 
OOLNBG+WtKoBaeumGothicL0122b4?Pw

and the result is not readable. The pdf is containing the necessary conversion 
table because every pdf reader (Desktop or Mobile) let me copy and past the 
text without problem.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to