[ https://issues.apache.org/jira/browse/PDFBOX-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584524#comment-15584524 ]
Maruan Sahyoun commented on PDFBOX-3519: ---------------------------------------- Acrobat seems to interpret the encoding instead of ISO 8859-1 as Wndows-1252. Doing so would give us the same string as Acrobat presents as it's including the missing #80 to #9F characters. [~jahewson] WDYT about changing the behavior implemented in PDFBOX-3347 to use {{Windows-1252}} instead of {{ISO 8859-1}} as it's currently implemented. > COSName is not ascii > -------------------- > > Key: PDFBOX-3519 > URL: https://issues.apache.org/jira/browse/PDFBOX-3519 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 2.0.3 > Reporter: simon steiner > Attachments: COSNameAcrobat.png > > > Trunk seems ok > PDF is from PDFBOX-783 > {code} > public static void main( String[] args ) throws IOException { > PDDocument doc = PDDocument.load(new File("A02Gj780LZ.pdf")); > COSDictionary x = doc.getPage(0).getResources().getCOSObject(); > read(x); > doc.close(); > } > private static void read(COSBase b) { > if (b instanceof COSObject) { > read(((COSObject) b).getObject()); > } else if (b instanceof COSDictionary) { > for (COSBase x : ((COSDictionary) b).getValues()) { > read(x); > } > } else if (b instanceof COSName) { > if(((COSName) b).getName().charAt(0) > 256) > throw new RuntimeException(((COSName) b).getName()); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org