Cetra Free created PDFBOX-2356:
----------------------------------
Summary: Error Validating PDF Archive Document
Key: PDFBOX-2356
URL: https://issues.apache.org/jira/browse/PDFBOX-2356
Project: PDFBox
Issue Type: Bug
Components: Preflight
Affects Versions: 1.8.6, 1.8.5, 1.8.4
Reporter: Cetra Free
When trying to validate a PDF archive file (attached to this ticket) we get the
following error:
{code}
7.2 - Error on MetaData, ModificationDate present in the document catalog
dictionary doesn't match with XMP information
{code}
This is because the the Modification Date in the Dictionary is parsed
differently from the XMP Metadata. The XMP Metadata is correct, but the Date
from the Dictionary appends an extra 30 minutes.
The following is the raw COSObject from the PDF File
{code}
COSString{D:20140917122850+09'30'}
{code}
The Long value should be *1410922730000*
The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the Date
with Long *1410924530000* which is 30 minutes ahead.
XMP Modification Date is parsed differently and returns the correct date.
This means that validation will fail for PDF Archives.
My suggestion would be to refactor the parseDate function to use the Standard
Java library.
Here's an example class which will be compatible with the PDF Specification:
{code}
static class DateParser {
private Map<Integer, SimpleDateFormat> formats =
new HashMap<Integer, SimpleDateFormat>();
public DateParser() {
String expr = "";
for(String part: Arrays.asList("yyyy", "MM", "dd", "HH", "mm", "ss", "Z")) {
expr = expr + part;
formats.put(expr.length(), new SimpleDateFormat(expr));
}
}
public Calendar parseDate(String expr) {
try {
expr = expr.replace("D:", "").replace("'", "").replace("Z", "+0000");
Date date = formats.get(Math.min(expr.length(), 15)).parse(expr);
Calendar calendar = Calendar.getInstance();
calendar.setTime(date);
return calendar;
} catch (ParseException e) {
return null;
}
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)