Maruan

With the exception of the documentation issue, these are not subjective matters,
you can’t disagree with an objective truth. Either falsify my claims or concede 
that
I am correct - we need to reach a technical resolution on this.

-- John

On 19 Mar 2014, at 13:48, Maruan Sahyoun <[email protected]> wrote:

> John
> 
> Am 19.03.2014 um 19:10 schrieb John Hewson <[email protected]>:
> 
>> Maruan,
>> 
>>>>> From how I understand the rendering in PDF Form, Text, Image and Pattern 
>>>>> maintain their own matrix to map to user space which is then transformed 
>>>>> by the CTM to device space so handling them specifically is fine and 
>>>>> inline with the spec.
>>>> 
>>>> No, that’s not right, what I said was:
>>>> 
>>>>>> My problem is that tiling patterns are defined in their parent stream’s 
>>>>>> initial coordinate space, rather than the
>>>>>> coordinate space defined by the CTM.
>>>> 
>>>> So patterns should *not* be using the CTM, which is what I’m trying to 
>>>> achieve.
>>>> 
>>> 
>>> I think you misunderstood what I wrote - patterns have their own matrix - 
>>> so I think we are on the same page here. IMHO according to the spec CTM 
>>> transforms from user space to device space. So it’s pattern space -> user 
>>> space -> device space.
>> 
>> Nope, as I said, that’s what PDFBox currently does and it’s wrong. As you 
>> say the CTM transforms from user space to device space, but it’s not the 
>> only way to do so, and it is not used by patterns.
> 
> As the processing is defined in the spec this is a good reference so no need 
> to discuss that further. Of course different people might come to different 
> conclusions by reading and interpreting the spec. 
> 
>> 
>>> Didn’t mean to only reference to the spec but to use the same terms as 
>>> described by the spec. Adding references to the spec is an add-on not a 
>>> replacement.
>> 
>> I don’t see what value this adds, given that the references will just go 
>> out-of-date when the next spec is released. We already use the same 
>> terminology as the PDF spec, so Ctrl+F can be used for quick look-ups that 
>> won’t go out-of-date.
> 
> You are not enforced to add the information.
> 
>> 
>>>> This isn’t possible, as I said it "will necessarily be a breaking change”. 
>>>> This is because in 2.0 PDFStreamEngine needs to know the parent of each 
>>>> stream, but processStream and processSubStream do not provide this 
>>>> information. That’s why I’m discussing this on the mailing list.
>>> 
>>> I don’t understand why this is shouldn’t be possible. It’s more effort, 
>>> agreed, but beneficial.
>> 
>> 
>> What’s not to understand? PDFStreamEngine *needs* to know the parent of each 
>> stream, and the old methods don’t provide this, passing a null parent will 
>> not work because we need that information later in order to correctly 
>> process the stream. If we allowed a null parent to be passed, the result 
>> would be silently broken rendering - there’s no value in providing a 
>> backwards-compatible API if it can only produce broken results.
> 
> Won’t get to the same conclusion here (as I think we won’t get on the other 
> topics above).
> 
>> 
>> -- John
>> 
>> On 19 Mar 2014, at 10:31, Maruan Sahyoun <[email protected]> wrote:
>> 
>>> John,
>>> 
>>> Am 19.03.2014 um 18:15 schrieb John Hewson <[email protected]>:
>>> 
>>>> Maruan
>>>> 
>>>>> From how I understand the rendering in PDF Form, Text, Image and Pattern 
>>>>> maintain their own matrix to map to user space which is then transformed 
>>>>> by the CTM to device space so handling them specifically is fine and 
>>>>> inline with the spec.
>>>> 
>>>> No, that’s not right, what I said was:
>>>> 
>>>>>> My problem is that tiling patterns are defined in their parent stream’s 
>>>>>> initial coordinate space, rather than the
>>>>>> coordinate space defined by the CTM.
>>>> 
>>>> So patterns should *not* be using the CTM, which is what I’m trying to 
>>>> achieve.
>>>> 
>>> 
>>> I think you misunderstood what I wrote - patterns have their own matrix - 
>>> so I think we are on the same page here. IMHO according to the spec CTM 
>>> transforms from user space to device space. So it’s pattern space -> user 
>>> space -> device space.
>>> 
>>> 
>>>>> I’d suggest that we make sure that the different ‚spaces‘ are defined 
>>>>> properly within the code and refer to the PDF spec so that the code is 
>>>>> easier to read if this is not already the case. With so many changes it’s 
>>>>> a good opportunity to enhance the documentation within the source code. 
>>>>> Some of the old code enjoys very little documentation.
>>>> 
>>>> 
>>>> I disagree, in general I don’t think that references to the PDF spec are a 
>>>> good form of documentation (there are some exceptions). References to the 
>>>> spec are meaningless to the reader unless they take the time to look them 
>>>> up in a 700 page PDF document. I would argue that by just linking back to 
>>>> the spec, we have *failed* to document PDFBox, not succeeded.
>>>> 
>>>> References to the PDF spec have another major flaw: they go out-of-date. 
>>>> For example a Pattern Colour Space will always be called “Pattern Colour 
>>>> Space” in future versions of the PDF spec but it may not be described in 
>>>> paragraph 8.6.6.2 or on page 156. The existing code contains many 
>>>> references to the PDF 1.6 and 1.7 specs as well as the ISO PDF32000 spec, 
>>>> which means that I need three 700 page PDF files open at all times in 
>>>> order to look up PDFBox references. With the new version of the PDF spec 
>>>> due this year, this situation is going to get worse.
>>>> 
>>> 
>>> Didn’t mean to only reference to the spec but to use the same terms as 
>>> described by the spec. Adding references to the spec is an add-on not a 
>>> replacement.
>>> 
>>>> I agree that some of the existing code needs more documentation, and I 
>>>> often add documentation to old files which I’m working on. However, my 
>>>> approach is to just paste in a sentence or two from the PDF spec (fair 
>>>> use). That way the reader does not ever need to look at the PDF spec. 
>>>> Because we use the same terminology in PDFBox as in the spec, if someone 
>>>> really wants to look something up, it’s as simple as Ctrl+F, no reference 
>>>> needed, and it’s guaranteed not to go out-of-date.
>>>> 
>>>>> I wouldn’t remove processStream and processSubStream but deprecate them 
>>>>> and remove them in the next major release though as to keep the changes 
>>>>> to a minimum.
>>>> 
>>>> This isn’t possible, as I said it "will necessarily be a breaking change”. 
>>>> This is because in 2.0 PDFStreamEngine needs to know the parent of each 
>>>> stream, but processStream and processSubStream do not provide this 
>>>> information. That’s why I’m discussing this on the mailing list.
>>> 
>>> I don’t understand why this is shouldn’t be possible. It’s more effort, 
>>> agreed, but beneficial.
>>> 
>>>> 
>>>>> For the rendering what might have been missed is taking the UserUnit 
>>>>> entry in the page dictionary into account which might change the default 
>>>>> user space. This was introduced in PDF 1.6. A good opportunity to read 
>>>>> that entry and make sure that we handle it appropriately.
>>>> 
>>>> Yes, I have this as a “todo” in my working copy, however, if we put the 
>>>> UserUnit in the matrix then we should also put the page Rotation into the 
>>>> matrix, but that’a a significant change.
>>>> 
>>>> -- John

Reply via email to