Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

John Hewson Mon, 10 Mar 2014 03:44:17 -0700

>> If the syntax hasn’t changed then there can’t be anything in the parser 
>> which is version-specific.
> 
> I think we are talking about two different things here. The parsing process 
> to get the tokens and the parsing process to follow the PDF file layout and 
> to form and follow the higher level structures such as Xref.


Yes, there are two phases, tokenizing and parsing; sometimes both are called 
parsing.

> Tokens didn’t change. File layout and higher level structures did like - 
> Linerization or Xref Streams. Dependent on the PDF standard some are 
> permitted some are not. 

That’s not right. The tokens have changed: “xref” is a keyword and therefore a 
token. Also, as I said originally, the syntax has changed, because what you 
call "higher level structures” is actually the syntax.

-- John

On 10 Mar 2014, at 02:32, Maruan Sahyoun <sahy...@fileaffairs.de> wrote:

> I think we are talking about two different things here. The parsing process 
> to get the tokens, and the parsing process to follow the PDF file layout and 
> to form and follow the higher level structures such as Xref. Tokens didn’t 
> change. File layout and higher level structures did like - Linerization or 
> Xref Streams. Dependent on the PDF standard some are permitted some are not. 
> 
> BR
> Maruan
> 
> Am 10.03.2014 um 10:06 schrieb John Hewson <j...@jahewson.com>:
> 
>>> The base syntax has not changed. But the elements described by the base 
>>> have.
>> 
>> 
>> If the syntax hasn’t changed then there can’t be anything in the parser 
>> which is version-specific.
>> 
>> -- John
>> 
>> On 10 Mar 2014, at 01:43, Maruan Sahyoun <sahy...@fileaffairs.de> wrote:
>> 
>>> Hi John,
>>> 
>>> it’s not about PDF versions but PDF versions and standards.
>>> 
>>> The base syntax has not changed. But the elements described by the base 
>>> have.
>>> 
>>> BR
>>> Maruan Sahyoun
>>> 
>>> Am 10.03.2014 um 09:20 schrieb John Hewson <j...@jahewson.com>:
>>> 
>>>> Hi Maruan
>>>> 
>>>>> As of today PDFBox has no formal support for specific PDF versions in a 
>>>>> way that a specific version can be enforced, validated ...
>>>> 
>>>> Perhaps that is because there is not much demand for this? Nowadays 
>>>> everyone has instant access to the latest version of Adobe Reader so 
>>>> checking that a PDF can be opened with a specific version of Adobe Reader 
>>>> is not that useful anymore. There might be some niche cases, but I can’t 
>>>> think what they would be. For cases where it’s important that a PDF file 
>>>> is valid then a format such as PDF/A or PDF/X must be used instead as 
>>>> “vanilla" PDF is ambiguous.
>>>> 
>>>>> The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not 
>>>>> be easily extended to other standards.
>>>> 
>>>> Yes, PDF/A is carefully validated because it is for archival purposes, 
>>>> unlike regular PDF files.
>>>> 
>>>>> Do you think that there is a need for a more formal support of such 
>>>>> standards and versions? The would influence some of the design decisions 
>>>>> for the parser and affect the base objects.
>>>> 
>>>> 
>>>> I can’t think of a reason why someone would want to parse a specific PDF 
>>>> version, so my answer is no, I don’t think there is such a need. Has the 
>>>> syntax of PDF even changed that much over the different versions?
>>>> 
>>>> — John
>>>> 
>>> 
>> 
>

Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

Reply via email to