[ 
https://issues.apache.org/jira/browse/PDFBOX-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584524#comment-15584524
 ] 

Maruan Sahyoun commented on PDFBOX-3519:
----------------------------------------

Acrobat seems to interpret the encoding instead of ISO 8859-1 as Wndows-1252. 
Doing so would give us the same string as Acrobat presents as it's including 
the missing #80 to #9F characters.

[~jahewson] WDYT about changing the behavior implemented in PDFBOX-3347 to use 
{{Windows-1252}} instead of {{ISO 8859-1}} as it's currently implemented.

> COSName is not ascii
> --------------------
>
>                 Key: PDFBOX-3519
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3519
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.3
>            Reporter: simon steiner
>         Attachments: COSNameAcrobat.png
>
>
> Trunk seems ok
> PDF is from PDFBOX-783
> {code}
> public static void main( String[] args ) throws IOException {
>     PDDocument doc = PDDocument.load(new File("A02Gj780LZ.pdf"));
>     COSDictionary x = doc.getPage(0).getResources().getCOSObject();
>         read(x);
>     doc.close();
> }
> private static void read(COSBase b) {
>     if (b instanceof COSObject) {
>         read(((COSObject) b).getObject());
>     } else if (b instanceof COSDictionary) {
>         for (COSBase x : ((COSDictionary) b).getValues()) {
>             read(x);
>         }
>     } else if (b instanceof COSName) {
>         if(((COSName) b).getName().charAt(0) > 256)
>             throw new RuntimeException(((COSName) b).getName());
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to