[
https://issues.apache.org/jira/browse/TIKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann resolved TIKA-1369.
-------------------------------------
Resolution: Fixed
Fix Version/s: 1.7
Assignee: Chris A. Mattmann
merged in r1629347.
> Date parsing and thread safety in ImageMetadataExtractor
> --------------------------------------------------------
>
> Key: TIKA-1369
> URL: https://issues.apache.org/jira/browse/TIKA-1369
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.5
> Environment: OS X 10.9.4 Java 7_60
> Reporter: John Gibson
> Assignee: Chris A. Mattmann
> Priority: Critical
> Fix For: 1.7
>
>
> The {{ImageMetadataExtractor}} uses a static instance of
> {{SimpleDateFormat}}. This is not thread safe.
> {code:title=ImageMetadataExtractor.java}
> static class ExifHandler implements DirectoryHandler {
> private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new
> SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss");
> ...
> public void handleDateTags(Directory directory, Metadata metadata)
> throws MetadataException {
> // Date/Time Original overrides value from
> ExifDirectory.TAG_DATETIME
> Date original = null;
> if
> (directory.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)) {
> original =
> directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL);
> // Unless we have GPS time we don't know the time zone so
> date must be set
> // as ISO 8601 datetime without timezone suffix (no Z or +/-)
> if (original != null) {
> String datetimeNoTimeZone =
> DATE_UNSPECIFIED_TZ.format(original); // Same time zone as Metadata Extractor
> uses
> metadata.set(TikaCoreProperties.CREATED,
> datetimeNoTimeZone);
> metadata.set(Metadata.ORIGINAL_DATE, datetimeNoTimeZone);
> }
> }
> ...
> {code}
> This is not the first time that SDF has caused problems: TIKA-495, TIKA-864.
> In the discussion there the idea of using alternative thread-safe (and
> faster) formatters from either Joda time or Commons Lang were dismissed
> because they would add too many dependencies. Given that Tika already has a
> fairly large laundry list of dependencies to parse content, adding one more
> JAR to make sure things don't break is probably a good idea.
> In addition, because no timezone or locale are specified by either Tika's
> formatter or the call to com.drew.metadata.Directory it can wreak havok
> during randomized testing. Given that the timezone is unknown, why not just
> default it to UTC and let the caller guess the timezone? As it stands I have
> to reparse all of the dates into UTC to get stable behavior across timezones.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)