[ https://issues.apache.org/jira/browse/NUTCH-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641156#comment-16641156 ]
Hudson commented on NUTCH-2642: ------------------------------- SUCCESS: Integrated in Jenkins build Nutch-trunk #3556 (See [https://builds.apache.org/job/Nutch-trunk/3556/]) NUTCH-2642 MoreIndexingFilter parses ISO 8601 UTC dates in local time (snagel: [https://github.com/apache/nutch/commit/d3864d66c0d859ce93b51d9bc13e4f912b36c0f1]) * (edit) src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java * (edit) src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java > MoreIndexingFilter parses ISO 8601 UTC dates in local time zone > --------------------------------------------------------------- > > Key: NUTCH-2642 > URL: https://issues.apache.org/jira/browse/NUTCH-2642 > Project: Nutch > Issue Type: Bug > Components: indexer, plugin > Affects Versions: 2.3.1, 1.14, 1.15 > Reporter: John Lacey > Priority: Minor > Fix For: 2.4, 1.16 > > > The ISO 8601 pattern in MoreIndexingFilter.getTime is > "yyyy-MM-dd'T'HH:mm:ss'Z'". Note the literal Z. > [https://github.com/apache/nutch/blob/b834b81/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L142] > Apache commons-lang's DateUtils uses the local time zone by default when > parsing, and can't tell that a string matching this pattern is specifying an > offset because the pattern doesn't have an offset, just a literal "Z": > [https://github.com/apache/commons-lang/blob/b610707/src/main/java/org/apache/commons/lang3/time/DateUtils.java#L370] > So, when parsing a date string such as "2018-09-04T12:34:56Z", the time is > returned as a local time: > DateUtils.parseDate("2018-09-04T12:34:56Z", new String[] \{ > "yyyy-MM-dd'T'HH:mm:ss'Z'" }) > => Tue Sep 04 12:34:56 PDT 2018 (1536089696000) > I think a reasonable fix would be to specify an offset pattern instead of a > literal "Z": "yyyy-MM-dd'T'HH:mm:ssXXX". That would also allow arbitrary > offsets, as well as "Z": > DateUtils.parseDate("2018-09-04T12:34:56Z", new String[] \{ > "yyyy-MM-dd'T'HH:mm:ssXXX" }) > => Tue Sep 04 05:34:56 PDT 2018 (1536064496000) -- This message was sent by Atlassian JIRA (v7.6.3#76005)