Re: PDFName.getName() returns escaped name?!

Vincent Hennebert Mon, 02 Apr 2012 10:17:28 -0700

Hi Craig,

On 30/03/12 01:23, Craig Ringer wrote:
> On 03/30/2012 05:09 AM, J.Pietschmann wrote:
>> Am 29.03.2012 01:24, schrieb Craig Ringer:
>>> I'd also like to have getEncodedName() return a byte[] not a
>>> String, since an encoded PDF name isn't actually text data.
>> Sounds like a reasonable idea.
>>
>>> BTW, is there any reason Fop's PDF library uses java.lang.String when
>>> working with sequences of PDF data bytes?
>> I'd chalk this up to "historical reasons", as usual. Fell free to
>> provide a patch which cleans this up.
>>
>> J.Pietschmann
> 
> Here's how I'd like to rewrite PDFName; untested code as an example of what
> I'm getting at. This is just a standalone file; a patch that incorporates it
> into the main sources will be a lot more work that I'm holding off on until I
> know folks here agree with the approach.


>From a quick look that sounds about right. Are you developing with a 1.7 JDK?
You would have to make your code 1.5-compatible. Also, it would be good if
you could back your optimizations with profiling data. If code
safety/readability have to be compromised there has to be a good reason
for it.


> In any case, after reading more of the PDF library I'm rethinking the wisdom
> of trying to make this change. The change its self is correct, but it'll be
> really hard to safely integrate into the rest of the PDF library because of
> the difficulty of auditing every site to ensure nothing breaks. Java likes to
> call `toString' automatically in places, meaning that anywhere that doesn't
> use the proper PDFWritable output methods PDFName inherits will break by
> producing bad PDF data that might be quite hard to spot. I'd start by making
> PDFName.toString() throw (for testing), but that'd only catch issues in code
> that test paths actually hit.
> 
> Given the number of these kinds of issues in fop's pdf library I'm more and
> more inclined to wonder if it should just be replaced with PDFBox. It's *full*
> of text encoding issues, it crams 8-bit binary data into the lower 8 bits of
> Unicode strings, etc. Most of the classes that extend basics like
> PDFDictionary act like the base class isn't public API and break if anyone
> else changes the dictionary in ways they don't expect, too; they should
> "have-a" PDFDictionary not "be-a" PDFDictionary really.
> 
> PDFBox is far from perfect, but it has a clean separation between the model
> classes (PDxxxx) and the basic PDF data types (COSxxx); it has a clean
> PDFName, PDFString, etc; it has a good PDF parser already, etc. Maybe it'd be
> easier for me to whip up a port of FOP's PDF output code to PDFBox? I suspect
> I'm insane to mention the possibility of doing that without evaluating the
> amount of work involved first, so I'm not promising anything, but by the looks
> it might be easier than doing the cleanups I'd like to do in fop.

We’ve already talked about that before:
http://markmail.org/message/r6pjwwmxgaawhzcp

There is a /lot/ of work to do on the PDF package to turn it into
a proper library, still it may be more practical to progressively
refactor it rather than starting from scratch a new renderer based on
PDFBox.

That said, please do feel free to give it a try if that route appeals to
you. We would certainly consider a switch if it looks more promising in
the long term.

> Thoughts?
> 
> -- 
> Craig Ringer

Thanks,
Vincent

Re: PDFName.getName() returns escaped name?!

Reply via email to