[
https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138197#comment-14138197
]
Cetra Free commented on PDFBOX-2356:
------------------------------------
I'm just using the code from here:
http://pdfbox.apache.org/cookbook/pdfavalidation.html
{code}
ValidationResult result = null;
FileDataSource fd = new FileDataSource(args[0]);
PreflightParser parser = new PreflightParser(fd);
try {
/* Parse the PDF file with PreflightParser that inherits from the
NonSequentialParser.
* Some additional controls are present to check a set of PDF/A requirements.
* (Stream length consistency, EOL after some Keyword...)
*/
parser.parse();
/* Once the syntax validation is done,
* the parser can provide a PreflightDocument
* (that inherits from PDDocument)
* This document process the end of PDF/A validation.
*/
PreflightDocument document = parser.getPreflightDocument();
document.validate();
// Get validation result
result = document.getResult();
document.close();
} catch (SyntaxValidationException e) {
/* the parse method can throw a SyntaxValidationException
*if the PDF file can't be parsed.
*/ In this case, the exception contains an instance of ValidationResult
result = e.getResult();
}
// display validation result
if (result.isValid()) {
System.out.println("The file " + args[0] + " is a valid PDF/A-1b file");
} else {
System.out.println("The file" + args[0] + " is not valid, error(s) :");
for (ValidationError error : result.getErrorsList()) {
System.out.println(error.getErrorCode() + " : " + error.getDetails());
}
}
{code}
> Error Validating PDF Archive Document
> -------------------------------------
>
> Key: PDFBOX-2356
> URL: https://issues.apache.org/jira/browse/PDFBOX-2356
> Project: PDFBox
> Issue Type: Bug
> Components: Preflight
> Affects Versions: 1.8.4, 1.8.5, 1.8.6
> Reporter: Cetra Free
> Attachments: pdfafile.pdf
>
>
> When trying to validate a PDF archive file (attached to this ticket) we get
> the following error:
> {code}
> 7.2 - Error on MetaData, ModificationDate present in the document catalog
> dictionary doesn't match with XMP information
> {code}
> This is because the the Modification Date in the Dictionary is parsed
> differently from the XMP Metadata. The XMP Metadata is correct, but the Date
> from the Dictionary appends an extra 30 minutes.
> The following is the raw COSObject from the PDF File
> {code}
> COSString{D:20140917122850+09'30'}
> {code}
> The Long value should be *1410922730000*
> The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the
> Date with Long *1410924530000* which is 30 minutes ahead.
> XMP Modification Date is parsed differently and returns the correct date.
> This means that validation will fail for PDF Archives.
> My suggestion would be to refactor the parseDate function to use the Standard
> Java library.
> Here's an example class which will be compatible with the PDF Specification:
> {code}
> static class DateParser {
> private Map<Integer, SimpleDateFormat> formats =
> new HashMap<Integer, SimpleDateFormat>();
>
> public DateParser() {
> String expr = "";
>
> for(String part: Arrays.asList("yyyy", "MM", "dd", "HH", "mm", "ss", "Z")) {
> expr = expr + part;
> formats.put(expr.length(), new SimpleDateFormat(expr));
> }
> }
>
> public Calendar parseDate(String expr) {
> try {
> expr = expr.replace("D:", "").replace("'", "").replace("Z", "+0000");
> Date date = formats.get(Math.min(expr.length(), 15)).parse(expr);
>
>
> Calendar calendar = Calendar.getInstance();
> calendar.setTime(date);
>
> return calendar;
> } catch (ParseException e) {
> return null;
> }
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)