[
https://issues.apache.org/jira/browse/PDFBOX-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543098#comment-15543098
]
Maruan Sahyoun edited comment on PDFBOX-3519 at 10/3/16 6:42 PM:
-----------------------------------------------------------------
Yes, it's equivalent to an ASCII interpretation.
Worth noting is that
- using hex does not automatically mean that we are dealing with unicode. E.g.
#20 shall be used for encoding a space within a name where prior to PDF 1.2 #
alone was used as a placeholder
- after decoding the hex code a name might be in UTF-8
- the bytes making up the name shall be used as the names value and not the
string representation.
There is also an error in above string - which I'll correct (I was using a
comma instead of the quotation mark)
According to Extented ASCII Table ISO-8859-1:
#82 is {{‚}} {{Single low-9 quotation mark}}
#96 is {{–}} {{En dash}}
Details in the PDF spec.
was (Author: msahyoun):
Yes, it's equivalent to an ASCII interpretation.
Worth noting is that
- using hex does not automatically mean that we are dealing with unicode. E.g.
#20 shall be used for encoding a space within a name where prior to PDF 1.2 #
alone was used as a placeholder
- after decoding the hex code a name might be in UTF-8
- the bytes making up the name shall be used as the names value and not the
string representation.
There is also an error in above string - which I'll correct (I was using a
comma instead of the quotation mark)
According to Extented ASCII Table ISO-8859-1:
#82 is {{‚}} {{Single low-9 quotation mark}}
#96 is {{}} {{En dash}}
Details in the PDF spec.
> COSName is not ascii
> --------------------
>
> Key: PDFBOX-3519
> URL: https://issues.apache.org/jira/browse/PDFBOX-3519
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.3
> Reporter: simon steiner
> Attachments: COSNameAcrobat.png
>
>
> Trunk seems ok
> PDF is from PDFBOX-783
> {code}
> public static void main( String[] args ) throws IOException {
> PDDocument doc = PDDocument.load(new File("A02Gj780LZ.pdf"));
> COSDictionary x = doc.getPage(0).getResources().getCOSObject();
> read(x);
> doc.close();
> }
> private static void read(COSBase b) {
> if (b instanceof COSObject) {
> read(((COSObject) b).getObject());
> } else if (b instanceof COSDictionary) {
> for (COSBase x : ((COSDictionary) b).getValues()) {
> read(x);
> }
> } else if (b instanceof COSName) {
> if(((COSName) b).getName().charAt(0) > 256)
> throw new RuntimeException(((COSName) b).getName());
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]