[
https://issues.apache.org/jira/browse/TIKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886119#action_12886119
]
Jukka Zitting commented on TIKA-451:
------------------------------------
I would only do property type checks in type-specific setters like setDate() or
setInteger(). I'd allow the generic set() method with a string argument to
always succeed. This avoids breaking the parsing of a document even if some of
its metadata fields are malformed against our expectations.
Similarly I'd avoid throwing any exceptions from metadata getters. A malformed
metadata value should probably be handled as if it was missing by the
type-specific getters, and returned as-is by the generic get() method.
> Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED
> ------------------------------------------------------------------------------
>
> Key: TIKA-451
> URL: https://issues.apache.org/jira/browse/TIKA-451
> Project: Tika
> Issue Type: Improvement
> Components: metadata, parser
> Affects Versions: 0.7
> Reporter: Nick Burch
> Assignee: Nick Burch
> Priority: Minor
>
> Currently, the PDF Parser does calendar.getTime().toString() which means
> dates end up in your local timezone, and are hard to parse
> The Open Document parsers output in iso 8601 format, which avoids these two
> problems
> The poi ole2 based parsers also output in date.toString() format, with the
> same timezone/parsing problems
> We should probably select one format, and update the parsers to all output in
> it
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.