[ 
https://issues.apache.org/jira/browse/TIKA-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386493#comment-17386493
 ] 

Tim Allison commented on TIKA-3496:
-----------------------------------

Thank you, [~nick].  This makes sense.  I'll implement the least-worst option 
and require users to "opt in" by selecting a DateNormalizing metadata filter.

On the good news, I hadn't realized that Property keeps a static map of all 
properties created so we don't need to modify the Metadata object...onwards and 
thank you!

> Dates should have a timezone?
> -----------------------------
>
>                 Key: TIKA-3496
>                 URL: https://issues.apache.org/jira/browse/TIKA-3496
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>            Priority: Major
>
> In working on the Solr pipe emitter, I noticed that some dates as stored in 
> our Metadata do not have a timezone, which causes a problem for Solr.
> I noticed this issue in a JPEG with date: "2011-06-11T09:30:54". 
> In a comment in our JPEG parser, I see:
> {noformat}
> // Unless we have GPS time we don't know the time zone so date must be set
>                 // as ISO 8601 datetime without timezone suffix (no Z or +/-)
> {noformat}
> So, the question is should we try to add a timezone (arbitrarily assign 'Z') 
> in the Solr (and OpenSearch) emitter or should we store the date as if it 
> were Z in the JPEG parser?  
> Or do something else?
> The challenge with doing anything on the emitter side, is that we aren't 
> currently storing the property type in the metadata.  So, at emit time, we 
> only have string keys and string values.  We can't easily guess which fields 
> should be a date in order to reformat for the sake of Solr. We could make a 
> request to Solr/OpenSearch to figure out what the field types are, but that 
> seems really awful...
> Ideas?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to