>> If the syntax hasn’t changed then there can’t be anything in the parser >> which is version-specific. > > I think we are talking about two different things here. The parsing process > to get the tokens and the parsing process to follow the PDF file layout and > to form and follow the higher level structures such as Xref.
Yes, there are two phases, tokenizing and parsing; sometimes both are called parsing. > Tokens didn’t change. File layout and higher level structures did like - > Linerization or Xref Streams. Dependent on the PDF standard some are > permitted some are not. That’s not right. The tokens have changed: “xref” is a keyword and therefore a token. Also, as I said originally, the syntax has changed, because what you call "higher level structures” is actually the syntax. -- John On 10 Mar 2014, at 02:32, Maruan Sahyoun <sahy...@fileaffairs.de> wrote: > I think we are talking about two different things here. The parsing process > to get the tokens, and the parsing process to follow the PDF file layout and > to form and follow the higher level structures such as Xref. Tokens didn’t > change. File layout and higher level structures did like - Linerization or > Xref Streams. Dependent on the PDF standard some are permitted some are not. > > BR > Maruan > > Am 10.03.2014 um 10:06 schrieb John Hewson <j...@jahewson.com>: > >>> The base syntax has not changed. But the elements described by the base >>> have. >> >> >> If the syntax hasn’t changed then there can’t be anything in the parser >> which is version-specific. >> >> -- John >> >> On 10 Mar 2014, at 01:43, Maruan Sahyoun <sahy...@fileaffairs.de> wrote: >> >>> Hi John, >>> >>> it’s not about PDF versions but PDF versions and standards. >>> >>> The base syntax has not changed. But the elements described by the base >>> have. >>> >>> BR >>> Maruan Sahyoun >>> >>> Am 10.03.2014 um 09:20 schrieb John Hewson <j...@jahewson.com>: >>> >>>> Hi Maruan >>>> >>>>> As of today PDFBox has no formal support for specific PDF versions in a >>>>> way that a specific version can be enforced, validated ... >>>> >>>> Perhaps that is because there is not much demand for this? Nowadays >>>> everyone has instant access to the latest version of Adobe Reader so >>>> checking that a PDF can be opened with a specific version of Adobe Reader >>>> is not that useful anymore. There might be some niche cases, but I can’t >>>> think what they would be. For cases where it’s important that a PDF file >>>> is valid then a format such as PDF/A or PDF/X must be used instead as >>>> “vanilla" PDF is ambiguous. >>>> >>>>> The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not >>>>> be easily extended to other standards. >>>> >>>> Yes, PDF/A is carefully validated because it is for archival purposes, >>>> unlike regular PDF files. >>>> >>>>> Do you think that there is a need for a more formal support of such >>>>> standards and versions? The would influence some of the design decisions >>>>> for the parser and affect the base objects. >>>> >>>> >>>> I can’t think of a reason why someone would want to parse a specific PDF >>>> version, so my answer is no, I don’t think there is such a need. Has the >>>> syntax of PDF even changed that much over the different versions? >>>> >>>> — John >>>> >>> >> >