[ 
https://issues.apache.org/jira/browse/NUTCH-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641153#comment-16641153
 ] 

Hudson commented on NUTCH-2642:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1616 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1616/])
NUTCH-2642 MoreIndexingFilter parses ISO 8601 UTC dates in local time (snagel: 
[https://github.com/apache/nutch/commit/9dc57fb8f1de2c37ee33622a3638f5ec61a803a4])
* (edit) 
src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
* (edit) 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java


> MoreIndexingFilter parses ISO 8601 UTC dates in local time zone
> ---------------------------------------------------------------
>
>                 Key: NUTCH-2642
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2642
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer, plugin
>    Affects Versions: 2.3.1, 1.14, 1.15
>            Reporter: John Lacey
>            Priority: Minor
>             Fix For: 2.4, 1.16
>
>
> The ISO 8601 pattern in MoreIndexingFilter.getTime is 
> "yyyy-MM-dd'T'HH:mm:ss'Z'". Note the literal Z.
> [https://github.com/apache/nutch/blob/b834b81/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L142]
> Apache commons-lang's DateUtils uses the local time zone by default when 
> parsing, and can't tell that a string matching this pattern is specifying an 
> offset because the pattern doesn't have an offset, just a literal "Z":
> [https://github.com/apache/commons-lang/blob/b610707/src/main/java/org/apache/commons/lang3/time/DateUtils.java#L370]
> So, when parsing a date string such as "2018-09-04T12:34:56Z", the time is 
> returned as a local time:
> DateUtils.parseDate("2018-09-04T12:34:56Z", new String[] \{ 
> "yyyy-MM-dd'T'HH:mm:ss'Z'" })
> => Tue Sep 04 12:34:56 PDT 2018 (1536089696000)
> I think a reasonable fix would be to specify an offset pattern instead of a 
> literal "Z": "yyyy-MM-dd'T'HH:mm:ssXXX". That would also allow arbitrary 
> offsets, as well as "Z":
> DateUtils.parseDate("2018-09-04T12:34:56Z", new String[] \{ 
> "yyyy-MM-dd'T'HH:mm:ssXXX" })
> => Tue Sep 04 05:34:56 PDT 2018 (1536064496000)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to