[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document

Cetra Free (JIRA) Wed, 17 Sep 2014 16:26:47 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138197#comment-14138197
 ]


Cetra Free commented on PDFBOX-2356:
------------------------------------

I'm just using the code from here:

http://pdfbox.apache.org/cookbook/pdfavalidation.html

{code}
ValidationResult result = null;

FileDataSource fd = new FileDataSource(args[0]);
PreflightParser parser = new PreflightParser(fd);
try {

  /* Parse the PDF file with PreflightParser that inherits from the 
NonSequentialParser.
   * Some additional controls are present to check a set of PDF/A requirements. 
   * (Stream length consistency, EOL after some Keyword...)
   */
  parser.parse();

  /* Once the syntax validation is done, 
   * the parser can provide a PreflightDocument 
   * (that inherits from PDDocument) 
   * This document process the end of PDF/A validation.
   */
  PreflightDocument document = parser.getPreflightDocument();
  document.validate();

  // Get validation result
  result = document.getResult();
  document.close();

} catch (SyntaxValidationException e) {
  /* the parse method can throw a SyntaxValidationException 
   *if the PDF file can't be parsed.
   */ In this case, the exception contains an instance of ValidationResult  
  result = e.getResult();
}

// display validation result
if (result.isValid()) {
  System.out.println("The file " + args[0] + " is a valid PDF/A-1b file");
} else {
  System.out.println("The file" + args[0] + " is not valid, error(s) :");
  for (ValidationError error : result.getErrorsList()) {
    System.out.println(error.getErrorCode() + " : " + error.getDetails());
  }
}
{code}


> Error Validating PDF Archive Document
> -------------------------------------
>
>                 Key: PDFBOX-2356
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2356
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Preflight
>    Affects Versions: 1.8.4, 1.8.5, 1.8.6
>            Reporter: Cetra Free
>         Attachments: pdfafile.pdf
>
>
> When trying to validate a PDF archive file (attached to this ticket) we get 
> the following error:
> {code}
> 7.2   - Error on MetaData, ModificationDate present in the document catalog 
> dictionary doesn't match with XMP information
> {code}
> This is because the the Modification Date in the Dictionary is parsed 
> differently from the XMP Metadata.  The XMP Metadata is correct, but the Date 
> from the Dictionary appends an extra 30 minutes.
> The following is the raw COSObject from the PDF File
> {code}
> COSString{D:20140917122850+09'30'}
> {code}
> The Long value should be *1410922730000*
> The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the 
> Date with Long *1410924530000* which is 30 minutes ahead.
> XMP Modification Date is parsed differently and returns the correct date.
> This means that validation will fail for PDF Archives.
> My suggestion would be to refactor the parseDate function to use the Standard 
> Java library.
> Here's an example class which will be compatible with the PDF Specification:
> {code}
> static class DateParser {
>  private Map<Integer, SimpleDateFormat> formats =
>    new HashMap<Integer, SimpleDateFormat>();
>  
>  public DateParser() {
>    String expr = "";
>  
>   for(String part: Arrays.asList("yyyy", "MM", "dd", "HH", "mm", "ss", "Z")) {
>      expr = expr + part;
>      formats.put(expr.length(), new SimpleDateFormat(expr));
>    }
>  }
>  
>  public Calendar parseDate(String expr) {
>    try {
>      expr = expr.replace("D:", "").replace("'", "").replace("Z", "+0000");
>      Date date = formats.get(Math.min(expr.length(), 15)).parse(expr);
>  
>  
>      Calendar calendar =  Calendar.getInstance();
>      calendar.setTime(date);
>  
>      return calendar;
>    } catch (ParseException e) {
>      return null;
>    }
>  }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document

Reply via email to