This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from 4bcaeeb  Merge pull request #328 from 
sebastian-nagel/nutch-2576-protocol-okhttp
     add 4cf9682  NUTCH-2549 protocol-http does not behave the same as browsers 
- add unit test class to emulate bad HTTP server sending   erroneous HTTP 
headers, etc. - add unit tests for processing of chunked content   (test 
NUTCH-2562 and NUTCH-2575)
     add 6239655  NUTCH-2555 URL normalization problem: path not starting with 
a '/' For URLs with query and an empty path (http://example.com?a=1): - fix 
urlnormalizer-basic to add the missing slash (http://example.com/?a=1) - fix 
protocol-http to send a correct "GET /?a=1 ..." request
     add 73d082e  NUTCH-2556 protocol-http makes invalid HTTP/1.0 requests - 
use HTTP/1.1 by default   (setting http.useHttp11 = false will sent HTTP/1.0 
requests)
     add 957306a  NUTCH-2564 protocol-http throws an error when the 
content-length header is not a number - ignore invalid Content-Length header 
(log warning instead of throwing exception)
     add 9e212a2  NUTCH-2559 protocol-http cannot handle colons after the HTTP 
status code (patch contributed by Gerard Bouchar)
     add 146a76c  NUTCH-2558 protocol-http cannot handle a missing HTTP status 
line NUTCH-2561 protocol-http can be made to read arbitrarily large HTTP 
responses - if parsing HTTP status line fails: log warning, push back input,   
assume status 200 OK (patch contributed by Gerard Bouchar) - limit max. length 
of HTTP header lines   - 2 kB for status line   - Http.BUFFER_SIZE (8 kB) for 
HTTP header field lines   - throw exception if header line is longer than limit 
- fix encoding when pushi [...]
     add 381e82f  NUTCH-2563 HTTP header spellchecking issues 
("Client-Transfer-Encoding" erroneously corrected to "Transfer-Encoding") - 
limit max. Levenshtein distance to 3 edit operations - add 
"Client-Transfer-Encoding" to known header fields
     add d163512  NUTCH-2557 protocol-http fails to follow redirections when 
HTTP response body is invalid (patch contributed by Gerard Bouchar) - catch 
exceptions while reading payload - if response code is not "200 OK": ignore 
exception but reset content
     add a2771dc  NUTCH-2560 protocol-http throws an error when an http header 
spans over multiple lines - add unit test to verify that multi-line headers are 
correctly parsed
     add 2e485cf  NUTCH-2549 protocol-http does not behave the same as browsers 
- be conformant with RFC 7230 and signal that connection is closed   after 
response (patch contributed by Gerard Bouchar)
     new 106df96  Merge pull request #347 from 
sebastian-nagel/NUTCH-2549-protocol-http-fixes

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 conf/nutch-default.xml                             |   6 +-
 .../org/apache/nutch/metadata/HttpHeaders.java     |   2 +
 .../nutch/metadata/SpellCheckedMetadata.java       |   7 +-
 .../apache/nutch/protocol/http/HttpResponse.java   | 161 ++++++++++++++-------
 .../src/test/conf/nutch-site-test.xml              |   8 +-
 .../protocol/http}/TestBadServerResponses.java     |  13 +-
 .../urlnormalizer/basic/BasicURLNormalizer.java    |  20 ++-
 .../basic/TestBasicURLNormalizer.java              |   2 +
 8 files changed, 143 insertions(+), 76 deletions(-)
 copy src/plugin/{protocol-okhttp/src/test/org/apache/nutch/protocol/okhttp => 
protocol-http/src/test/org/apache/nutch/protocol/http}/TestBadServerResponses.java
 (97%)

-- 
To stop receiving notification emails like this one, please contact
sna...@apache.org.

Reply via email to