[
https://issues.apache.org/jira/browse/TIKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894877#action_12894877
]
Staffan Olsson commented on TIKA-451:
-------------------------------------
Jpeg parser (TiffExtractor.handleCommonImageTags and JpegParserTest) has the
same issue.
The test asserts for a date format that is not iso. The field's
(DublinCore.DATE) javadoc says ISO 8601 so the test is clearly wrong. There is
a "TODO Make me a Date Property" on it. I have code for parsing Metadata
Extractor's date to ISO so I could fix this, but what field should we use? This
issue discusses MSOffice.CREATION_DATE but I think DublinCore makes more sense
for images. However Tika will be easier to use if there is only one creation
date field.
> Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED
> ------------------------------------------------------------------------------
>
> Key: TIKA-451
> URL: https://issues.apache.org/jira/browse/TIKA-451
> Project: Tika
> Issue Type: Improvement
> Components: metadata, parser
> Affects Versions: 0.7
> Reporter: Nick Burch
> Assignee: Nick Burch
> Priority: Minor
>
> Currently, the PDF Parser does calendar.getTime().toString() which means
> dates end up in your local timezone, and are hard to parse
> The Open Document parsers output in iso 8601 format, which avoids these two
> problems
> The poi ole2 based parsers also output in date.toString() format, with the
> same timezone/parsing problems
> We should probably select one format, and update the parsers to all output in
> it
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.