Sergey Makarov created PDFBOX-4549:
--------------------------------------
Summary: No Unicode mapping
Key: PDFBOX-4549
URL: https://issues.apache.org/jira/browse/PDFBOX-4549
Project: PDFBox
Issue Type: Bug
Reporter: Sergey Makarov
Attachments: XO_Thames.zip, our_star_wars.pdf
Hello, if i try get text from pdf (attached), i will result empty out and many
warns. Font attached also.
Acrobat reader will open succeed, i can select and copy text
my code:
{code:java}
private static void parseOne(String path) throws IOException {
String pdfFileInText;
PDFTextStripper tStripper;
File file = new File(path);
tStripper = new PDFTextStripper();
MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMixed(0,
500000000).setTempDir(new File("/home/user/pdfBoxTest/newFiles/"));
PDDocument document = PDDocument.load(file, memUsageSetting);
if (!document.isEncrypted()) {
pdfFileInText = tStripper.getText(document);
System.out.print(pdfFileInText);
}
document.close();
}{code}
Error:
{code:java}
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+83 (83) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+116 (116) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+97 (97) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+114 (114) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+87 (87) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+115 (115) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font HPDFAB+DejaVuSansMono,Book
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]