[
https://issues.apache.org/jira/browse/NUTCH-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173264#comment-17173264
]
ASF GitHub Bot commented on NUTCH-2814:
---------------------------------------
sebastian-nagel opened a new pull request #546:
URL: https://github.com/apache/nutch/pull/546
- reset time zone to GMT after parsing a date
- add unit test
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> HttpDateFormat's internal time zone may change after parsing a date
> -------------------------------------------------------------------
>
> Key: NUTCH-2814
> URL: https://issues.apache.org/jira/browse/NUTCH-2814
> Project: Nutch
> Issue Type: Bug
> Components: protocol
> Affects Versions: 1.17
> Reporter: Sebastian Nagel
> Priority: Major
> Fix For: 1.18
>
>
> In the Common Crawl WARC files I've observed that the If-modified-since
> header is sent in varying time zones:
> {noformat}
> If-Modified-Since: Tue, 25 Feb 2020 03:33:21 MSK
> If-Modified-Since: Sun, 22 Sep 2019 04:41:48 GMT
> If-Modified-Since: Mon, 18 Nov 2019 12:06:19 KRAT
> If-Modified-Since: Tue, 21 Jan 2020 02:10:22 UTC
> If-Modified-Since: Fri, 18 Oct 2019 20:23:57 BST
> If-Modified-Since: Sun, 20 Oct 2019 08:39:26 CEST
> If-Modified-Since: Fri, 15 Nov 2019 12:56:38 EST
> If-Modified-Since: Mon, 30 Mar 2020 09:10:33 GMT
> If-Modified-Since: Mon, 30 Mar 2020 05:18:36 GMT
> If-Modified-Since: Fri, 28 Feb 2020 03:09:16 PST
> If-Modified-Since: Thu, 21 Nov 2019 10:16:19 YEKT
> If-Modified-Since: Thu, 14 Nov 2019 18:01:05 EET
> If-Modified-Since: Thu, 14 Nov 2019 16:46:43 UTC
> If-Modified-Since: Sun, 17 Nov 2019 13:14:28 UTC
> If-Modified-Since: Tue, 25 Feb 2020 21:46:10 GMT
> If-Modified-Since: Wed, 16 Oct 2019 19:03:31 UTC
> If-Modified-Since: Thu, 14 Nov 2019 09:07:13 EST
> If-Modified-Since: Thu, 09 Apr 2020 12:21:53 EEST
> If-Modified-Since: Sat, 28 Mar 2020 19:08:52 CET
> If-Modified-Since: Sun, 23 Feb 2020 12:22:46 CET
> If-Modified-Since: Mon, 21 Oct 2019 03:18:16 PDT
> If-Modified-Since: Fri, 15 Nov 2019 05:41:44 UTC
> If-Modified-Since: Thu, 09 Apr 2020 21:01:32 CEST
> If-Modified-Since: Wed, 11 Dec 2019 11:18:28 KRAT
> If-Modified-Since: Tue, 22 Oct 2019 18:55:54 GMT
> {noformat}
> This actually happens because the time zone of HttpDateFormat's internal
> SimpleDateFormatter may change when a date is parsed. The next formatting
> uses the time zone of the last parsed date.
> The usage of "GMT" as time zone is specified in [sec. 7.1.1.1 of RFC
> 7231|https://tools.ietf.org/html/rfc7231#section-7.1.1.1].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)