[ https://issues.apache.org/jira/browse/NUTCH-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490789#comment-16490789 ]
Sebastian Nagel commented on NUTCH-2557: ---------------------------------------- The name is arbitrary. But it's always hard to fine one which is descriptive but not too specific. What about {{http.content.store.always}}? These would include redirects, 404, not modified and further HTTP status codes. But it's your decision to select a suitable name. Thanks! > protocol-http fails to follow redirections when an HTTP response body is > invalid > -------------------------------------------------------------------------------- > > Key: NUTCH-2557 > URL: https://issues.apache.org/jira/browse/NUTCH-2557 > Project: Nutch > Issue Type: Sub-task > Reporter: Gerard Bouchar > Priority: Major > > If a server sends a redirection (3XX status code, with a Location header), > protocol-http tries to parse the HTTP response body anyway. Thus, if an error > occurs while decoding the body, the redirection is not followed and the > information is lost. Browsers follow the redirection and close the socket > soon as they can. > * Example: this page is a redirection to its https version, with an HTTP > body containing invalidly gzip encoded contents. Browsers follow the > redirection, but nutch throws an error: > ** [http://www.webarcelona.net/es/blog?page=2] > > The HttpResponse::getContent class can already return null. I think it should > at least return null when parsing the HTTP response body fails. > Ideally, we would adopt the same behavior as browsers, and not even try > parsing the body when the headers indicate a redirection. -- This message was sent by Atlassian JIRA (v7.6.3#76005)