Peter Winckles created TIKA-3487:
------------------------------------
Summary: Timezones inappropriately set to GMT
Key: TIKA-3487
URL: https://issues.apache.org/jira/browse/TIKA-3487
Project: Tika
Issue Type: Bug
Components: metadata
Affects Versions: 1.27
Reporter: Peter Winckles
The code in
[ImageMetadataExtractor.handleDateTags|https://github.com/apache/tika/blob/dc571dddba324485fdb6dc1d665163e56267d0fc/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-image-module/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java#L542]
does not correctly handle Exif timestamps. While the notes in the code are
correct about how the timestamps should be handled, the code is not behaving as
expected because
[Directory.getDate|https://github.com/drewnoakes/metadata-extractor/blob/master/Source/com/drew/metadata/Directory.java#L828]
has already inappropriately modified the timestamps to be in GMT. So, instead
of writing the timestamp out as it was originally recorded, it's being written
out in GMT.
For example, I have an image with a Created Date of "2010:07:07 14:22:53". Tika
displays this as: {{<meta name="Creation-Date" content="2010-07-07T09:22:53"/>}}
This happens because the SimpleDateObject in ImageMetadataExtractor that
formats the date is using my local timezone, which is in Central; while the
Date object is in GMT.
I considered filing this issue against the metadata-extractor library, but did
not because I was unsure if there were some instances where this behavior is
appropriate and it is clearly documented in the API.
A possible solution for Tike would be simply to not use metadata-extractor to
parse the date, and handle the parse logic internally.
[See page 33 of this
resource|https://web.archive.org/web/20180919181934/http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf]
for a thorough description of how these fields are supposed to be handled.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)