Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

Maruan Sahyoun Mon, 10 Mar 2014 03:58:02 -0700

OK - wasn’t precise enough - token types didn’t change but there are newer 
tokens introduced.


As the syntax has changed do we need version and standards support in the 
parsing phase then? Other way would be to parse what’s in there and do 
validation etc. purely on the parsing result (COS model, PD model). Need to do 
that anyway.

What about writing?

BR
Maruan Sahyoun

Am 10.03.2014 um 11:43 schrieb John Hewson <[email protected]>:

>>> If the syntax hasn’t changed then there can’t be anything in the parser 
>>> which is version-specific.
>> 
>> I think we are talking about two different things here. The parsing process 
>> to get the tokens and the parsing process to follow the PDF file layout and 
>> to form and follow the higher level structures such as Xref.
> 
> Yes, there are two phases, tokenizing and parsing; sometimes both are called 
> parsing.
> 
>> Tokens didn’t change. File layout and higher level structures did like - 
>> Linerization or Xref Streams. Dependent on the PDF standard some are 
>> permitted some are not. 
> 
> That’s not right. The tokens have changed: “xref” is a keyword and therefore 
> a token. Also, as I said originally, the syntax has changed, because what you 
> call "higher level structures” is actually the syntax.
> 
> -- John
> 
> On 10 Mar 2014, at 02:32, Maruan Sahyoun <[email protected]> wrote:
> 
>> I think we are talking about two different things here. The parsing process 
>> to get the tokens, and the parsing process to follow the PDF file layout and 
>> to form and follow the higher level structures such as Xref. Tokens didn’t 
>> change. File layout and higher level structures did like - Linerization or 
>> Xref Streams. Dependent on the PDF standard some are permitted some are not. 
>> 
>> BR
>> Maruan
>> 
>> Am 10.03.2014 um 10:06 schrieb John Hewson <[email protected]>:
>> 
>>>> The base syntax has not changed. But the elements described by the base 
>>>> have.
>>> 
>>> 
>>> If the syntax hasn’t changed then there can’t be anything in the parser 
>>> which is version-specific.
>>> 
>>> -- John
>>> 
>>> On 10 Mar 2014, at 01:43, Maruan Sahyoun <[email protected]> wrote:
>>> 
>>>> Hi John,
>>>> 
>>>> it’s not about PDF versions but PDF versions and standards.
>>>> 
>>>> The base syntax has not changed. But the elements described by the base 
>>>> have.
>>>> 
>>>> BR
>>>> Maruan Sahyoun
>>>> 
>>>> Am 10.03.2014 um 09:20 schrieb John Hewson <[email protected]>:
>>>> 
>>>>> Hi Maruan
>>>>> 
>>>>>> As of today PDFBox has no formal support for specific PDF versions in a 
>>>>>> way that a specific version can be enforced, validated ...
>>>>> 
>>>>> Perhaps that is because there is not much demand for this? Nowadays 
>>>>> everyone has instant access to the latest version of Adobe Reader so 
>>>>> checking that a PDF can be opened with a specific version of Adobe Reader 
>>>>> is not that useful anymore. There might be some niche cases, but I can’t 
>>>>> think what they would be. For cases where it’s important that a PDF file 
>>>>> is valid then a format such as PDF/A or PDF/X must be used instead as 
>>>>> “vanilla" PDF is ambiguous.
>>>>> 
>>>>>> The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not 
>>>>>> be easily extended to other standards.
>>>>> 
>>>>> Yes, PDF/A is carefully validated because it is for archival purposes, 
>>>>> unlike regular PDF files.
>>>>> 
>>>>>> Do you think that there is a need for a more formal support of such 
>>>>>> standards and versions? The would influence some of the design decisions 
>>>>>> for the parser and affect the base objects.
>>>>> 
>>>>> 
>>>>> I can’t think of a reason why someone would want to parse a specific PDF 
>>>>> version, so my answer is no, I don’t think there is such a need. Has the 
>>>>> syntax of PDF even changed that much over the different versions?
>>>>> 
>>>>> — John
>>>>> 
>>>> 
>>> 
>> 
>

Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

Reply via email to