[
https://issues.apache.org/jira/browse/TIKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884415#action_12884415
]
Nick Burch commented on TIKA-451:
---------------------------------
OK, makes sense to me
As we have several parsers which currently have a Date object (or a Calendar
one that can yield a Date), we probably want to put the Date -> ISO 8601 string
conversion in one place to save duplication. I think adding lots of overloaded
methods to the Metadata object might make things a little ugly (eg set+add with
String+Property, possibly for both Date and Calendar....)
One option I see is a single overloaded set(Property,Date), since we shouldn't
need to handle multiple Dates so don't need an add. This would involve
switching a couple of the Metadata keys from String to Property though (but I
don't think this should affect many users, if any)
The other option is to add a static helper method, probably on Metadata but it
needn't have to be, of something like "public static String formatDate(Date d)"
and "public static String formatDate(Calendar c)", then keep the rest of the
Metadata object as-is, and require the parsers to use the helper to do date ->
string before storing the string.
Since we do have set(Property,int), I'd probably lean towards the former
option. What does everyone else think?
Nick
> Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED
> ------------------------------------------------------------------------------
>
> Key: TIKA-451
> URL: https://issues.apache.org/jira/browse/TIKA-451
> Project: Tika
> Issue Type: Improvement
> Components: metadata, parser
> Affects Versions: 0.7
> Reporter: Nick Burch
> Priority: Minor
>
> Currently, the PDF Parser does calendar.getTime().toString() which means
> dates end up in your local timezone, and are hard to parse
> The Open Document parsers output in iso 8601 format, which avoids these two
> problems
> The poi ole2 based parsers also output in date.toString() format, with the
> same timezone/parsing problems
> We should probably select one format, and update the parsers to all output in
> it
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.