[ 
https://issues.apache.org/jira/browse/TIKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886119#action_12886119
 ] 

Jukka Zitting commented on TIKA-451:
------------------------------------

I would only do property type checks in type-specific setters like setDate() or 
setInteger(). I'd allow the generic set() method with a string argument to 
always succeed. This avoids breaking the parsing of a document even if some of 
its metadata fields are malformed against our expectations.

Similarly I'd avoid throwing any exceptions from metadata getters. A malformed 
metadata value should probably be handled as if it was missing by the 
type-specific getters, and returned as-is by the generic get() method.


> Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED
> ------------------------------------------------------------------------------
>
>                 Key: TIKA-451
>                 URL: https://issues.apache.org/jira/browse/TIKA-451
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>            Priority: Minor
>
> Currently, the PDF Parser does   calendar.getTime().toString()   which means 
> dates end up in your local timezone, and are hard to parse
> The Open Document parsers output in iso 8601 format, which avoids these two 
> problems
> The poi ole2 based parsers also output in date.toString() format, with the 
> same timezone/parsing problems
> We should probably select one format, and update the parsers to all output in 
> it

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to