[ 
https://issues.apache.org/jira/browse/TIKA-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386577#comment-17386577
 ] 

Hudson commented on TIKA-3496:
------------------------------

SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk8 #289 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/289/])
TIKA-3496 -- add a metadatafilter to allow users to ensure that dates emitted 
to Solr/OpenSearch are all UTC 'Z' formatted. (tallison: 
[https://github.com/apache/tika/commit/381b36e8cc9a83fb34f22a74eaf20b86c78b6274])
* (add) 
tika-pipes/tika-emitters/tika-emitter-solr/src/test/java/org/apache/tika/pipes/emitter/solr/SolrEmitterDevTest.java
* (edit) 
tika-integration-tests/tika-pipes-solr-integration-tests/src/test/resources/tika-config-solr-urls.xml
* (edit) 
tika-integration-tests/tika-pipes-opensearch-integration-tests/src/test/resources/opensearch/tika-config-opensearch.xml
* (add) 
tika-core/src/main/java/org/apache/tika/metadata/filter/DateNormalizingMetadataFilter.java
* (edit) 
tika-core/src/test/java/org/apache/tika/metadata/filter/TestMetadataFilter.java
* (edit) CHANGES.txt
* (edit) tika-core/src/main/java/org/apache/tika/metadata/Property.java


> Allow users to specify a default timezone when a file format doesn't store 
> the tz
> ---------------------------------------------------------------------------------
>
>                 Key: TIKA-3496
>                 URL: https://issues.apache.org/jira/browse/TIKA-3496
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>            Priority: Major
>             Fix For: 2.0.1
>
>
> In working on the Solr pipe emitter, I noticed that some dates as stored in 
> our Metadata do not have a timezone, which causes a problem for Solr.
> I noticed this issue in a JPEG with date: "2011-06-11T09:30:54". 
> In a comment in our JPEG parser, I see:
> {noformat}
> // Unless we have GPS time we don't know the time zone so date must be set
>                 // as ISO 8601 datetime without timezone suffix (no Z or +/-)
> {noformat}
> So, the question is should we try to add a timezone (arbitrarily assign 'Z') 
> in the Solr (and OpenSearch) emitter or should we store the date as if it 
> were Z in the JPEG parser?  
> Or do something else?
> The challenge with doing anything on the emitter side, is that we aren't 
> currently storing the property type in the metadata.  So, at emit time, we 
> only have string keys and string values.  We can't easily guess which fields 
> should be a date in order to reformat for the sake of Solr. We could make a 
> request to Solr/OpenSearch to figure out what the field types are, but that 
> seems really awful...
> Ideas?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to