[ https://issues.apache.org/jira/browse/TIKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077584#comment-14077584 ]
ASF GitHub Bot commented on TIKA-1369: -------------------------------------- GitHub user vilmospapp opened a pull request: https://github.com/apache/tika/pull/15 TIKA-1369 Resolve thread safety issue in ImageMetadataExtractor Hi, This fix tries to resolve TIKA-1369 with handle thread safety by ThreadLocal and avoid other library dependencies. I have run the test cases, so it seems correct to me, though I haven't found any other occurrence of ThreadLocal in Tika's source, so perhaps it's against your general patterns. Regards, Vilmos You can merge this pull request into a Git repository by running: $ git pull https://github.com/vilmospapp/tika TIKA-1369 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/15.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15 ---- commit 3a9575fc56a6463b4378b14820e9079352bb1848 Author: Vilmos Papp <papp.gyorgy.vil...@gmail.com> Date: 2014-07-23T09:18:50Z TIKA-1369 Make SimpleDateFormat usage thread safe ---- > Date parsing and thread safety in ImageMetadataExtractor > -------------------------------------------------------- > > Key: TIKA-1369 > URL: https://issues.apache.org/jira/browse/TIKA-1369 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.5 > Environment: OS X 10.9.4 Java 7_60 > Reporter: John Gibson > Priority: Critical > > The {{ImageMetadataExtractor}} uses a static instance of > {{SimpleDateFormat}}. This is not thread safe. > {code:title=ImageMetadataExtractor.java} > static class ExifHandler implements DirectoryHandler { > private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new > SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss"); > ... > public void handleDateTags(Directory directory, Metadata metadata) > throws MetadataException { > // Date/Time Original overrides value from > ExifDirectory.TAG_DATETIME > Date original = null; > if > (directory.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)) { > original = > directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL); > // Unless we have GPS time we don't know the time zone so > date must be set > // as ISO 8601 datetime without timezone suffix (no Z or +/-) > if (original != null) { > String datetimeNoTimeZone = > DATE_UNSPECIFIED_TZ.format(original); // Same time zone as Metadata Extractor > uses > metadata.set(TikaCoreProperties.CREATED, > datetimeNoTimeZone); > metadata.set(Metadata.ORIGINAL_DATE, datetimeNoTimeZone); > } > } > ... > {code} > This is not the first time that SDF has caused problems: TIKA-495, TIKA-864. > In the discussion there the idea of using alternative thread-safe (and > faster) formatters from either Joda time or Commons Lang were dismissed > because they would add too many dependencies. Given that Tika already has a > fairly large laundry list of dependencies to parse content, adding one more > JAR to make sure things don't break is probably a good idea. > In addition, because no timezone or locale are specified by either Tika's > formatter or the call to com.drew.metadata.Directory it can wreak havok > during randomized testing. Given that the timezone is unknown, why not just > default it to UTC and let the caller guess the timezone? As it stands I have > to reparse all of the dates into UTC to get stable behavior across timezones. -- This message was sent by Atlassian JIRA (v6.2#6252)