Hi all

I've been working with PDFName in my code and have run into a bit of an oddity I was hoping for comments on.

For any given string `fred', the operation:

   ( new PDFName(fred) ).getName().equals(fred)

isn't guaranteed to be true, because PDFName.getName() returns the *escaped* name. It strips the leading slash added by PDFName.escapeName(), so most of the time the returned name will be the same, but it's a good candidate for creating exciting bugs.

I'd like to be able to use PDFName instead of String as a map key (for clarity, mostly), but need to be able to get the original encapsulated string quickly (without decoding) and reliably.

I'd like to change PDFName so that it keeps a reference to the original name string and returns that from getName(). It should encode the name on the first call to the new getEncodedName() method, storing it in a local member, so short-lived PDFName objects don't waste time encoding strings. I'd also like to have getEncodedName() return a byte[] not a String, since an encoded PDF name isn't actually text data.

BTW, is there any reason Fop's PDF library uses java.lang.String when working with sequences of PDF data bytes? For example, the output of PDFName.escapeName(...) isn't really a "string" at all, in that it's not meaningful text in any encoding, it's just a byte sequence jammed into the lower 8 bits of unicode code points. It's pretty confusing having it as a String (logically an array of unicode characters) rather than as a byte[]. Right now, fop also writes 8-bit characters in names incorrectly - the toHex(...) and PDFName.escapeName(...) methods translate values between 128 and 255 inclusive of each *unicode* *character* in a String to hex and write that out. This is incorrect, because PDF names should be UTF-8, so it should be encoding to a UTF-8 byte sequence then escaping.

Craig Ringer

POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088     Fax: 08 9388 2258
ABN: 50 008 917 717

Reply via email to