Hi,

I'm working on a preflight refactoring [1].

Currently, the validation class implements the PdfAValidator interface that
takes a DataSource in parameter.
This class :
- checks the syntax with a JavaCC parser
- loads a PDDocument Object
- runs a set of Helper on each object present in the PDF.
- returns a ValidationResult  object that contains all errors and the
PDDocument.

Here is what I would like to do:
- Create a PdfAParser that inherits an existing parser (NonSequential or
PdfParser) to process some validations during the load process. (ex :
Stream length validation)
- this specific parser will provide an instance of PdfADocument that
inherits from PDDocument and contains the list of validation errors.
- Create new helpers that check the PDF in a logical way (page by page)
instead of object by object
- Create a new implementation of the PdfAValidator that uses this new
parser to keep compatibility with old version, but the PdfAParser should be
the new entry point.

Due to the validation path that the new parser must follow (validate the
pdf in a logical way instead of object by object), I will create new
classes and I will use as much as possible existing classes to avoid
breaking the current implementation. All classes that become useless will
be marked as Deprecated.

I have two questions :
- Do you think that the new validation seems better ?
- Does the end of validation process must be done in a method "validate" of
the PdfADocument or at the end of the parse method ? In the first case, the
"validate" method could be call during the "save" method and check if a new
PDF is a PDF/A, but because the validation starts in the Parser (syntax and
stream length check) may be a "validate" method isn't the right things to
do...


If you have some questions do not hesitate.


BR,
Eric



[1]https://issues.apache.org/jira/browse/PDFBOX-1312

Reply via email to