[
https://issues.apache.org/jira/browse/TIKA-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386577#comment-17386577
]
Hudson commented on TIKA-3496:
------------------------------
SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk8 #289 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/289/])
TIKA-3496 -- add a metadatafilter to allow users to ensure that dates emitted
to Solr/OpenSearch are all UTC 'Z' formatted. (tallison:
[https://github.com/apache/tika/commit/381b36e8cc9a83fb34f22a74eaf20b86c78b6274])
* (add)
tika-pipes/tika-emitters/tika-emitter-solr/src/test/java/org/apache/tika/pipes/emitter/solr/SolrEmitterDevTest.java
* (edit)
tika-integration-tests/tika-pipes-solr-integration-tests/src/test/resources/tika-config-solr-urls.xml
* (edit)
tika-integration-tests/tika-pipes-opensearch-integration-tests/src/test/resources/opensearch/tika-config-opensearch.xml
* (add)
tika-core/src/main/java/org/apache/tika/metadata/filter/DateNormalizingMetadataFilter.java
* (edit)
tika-core/src/test/java/org/apache/tika/metadata/filter/TestMetadataFilter.java
* (edit) CHANGES.txt
* (edit) tika-core/src/main/java/org/apache/tika/metadata/Property.java
> Allow users to specify a default timezone when a file format doesn't store
> the tz
> ---------------------------------------------------------------------------------
>
> Key: TIKA-3496
> URL: https://issues.apache.org/jira/browse/TIKA-3496
> Project: Tika
> Issue Type: Bug
> Reporter: Tim Allison
> Priority: Major
> Fix For: 2.0.1
>
>
> In working on the Solr pipe emitter, I noticed that some dates as stored in
> our Metadata do not have a timezone, which causes a problem for Solr.
> I noticed this issue in a JPEG with date: "2011-06-11T09:30:54".
> In a comment in our JPEG parser, I see:
> {noformat}
> // Unless we have GPS time we don't know the time zone so date must be set
> // as ISO 8601 datetime without timezone suffix (no Z or +/-)
> {noformat}
> So, the question is should we try to add a timezone (arbitrarily assign 'Z')
> in the Solr (and OpenSearch) emitter or should we store the date as if it
> were Z in the JPEG parser?
> Or do something else?
> The challenge with doing anything on the emitter side, is that we aren't
> currently storing the property type in the metadata. So, at emit time, we
> only have string keys and string values. We can't easily guess which fields
> should be a date in order to reformat for the sake of Solr. We could make a
> request to Solr/OpenSearch to figure out what the field types are, but that
> seems really awful...
> Ideas?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)