[ 
https://issues.apache.org/jira/browse/TIKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884415#action_12884415
 ] 

Nick Burch commented on TIKA-451:
---------------------------------

OK, makes sense to me

As we have several parsers which currently have a Date object (or a Calendar 
one that can yield a Date), we probably want to put the Date -> ISO 8601 string 
conversion in one place to save duplication. I think adding lots of overloaded 
methods to the Metadata object might make things a little ugly (eg set+add with 
String+Property, possibly for both Date and Calendar....)

One option I see is a single overloaded set(Property,Date), since we shouldn't 
need to handle multiple Dates so don't need an add. This would involve 
switching a couple of the Metadata keys from String to Property though (but I 
don't think this should affect many users, if any)

The other option is to add a static helper method, probably on Metadata but it 
needn't have to be, of something like "public static String formatDate(Date d)" 
and "public static String formatDate(Calendar c)", then keep the rest of the 
Metadata object as-is, and require the parsers to use the helper to do date -> 
string before storing the string.

Since we do have set(Property,int), I'd probably lean towards the former 
option. What does everyone else think?

Nick

> Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED
> ------------------------------------------------------------------------------
>
>                 Key: TIKA-451
>                 URL: https://issues.apache.org/jira/browse/TIKA-451
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>            Priority: Minor
>
> Currently, the PDF Parser does   calendar.getTime().toString()   which means 
> dates end up in your local timezone, and are hard to parse
> The Open Document parsers output in iso 8601 format, which avoids these two 
> problems
> The poi ole2 based parsers also output in date.toString() format, with the 
> same timezone/parsing problems
> We should probably select one format, and update the parsers to all output in 
> it

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to