[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764802#comment-17764802 ]
Hudson commented on NUTCH-3001: ------------------------------- SUCCESS: Integrated in Jenkins build Nutch ยป Nutch-trunk #110 (See [https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/110/]) NUTCH-3001 - fix logic for grabbing bytes if there's no content type in the header (tallison: [https://github.com/apache/nutch/commit/b6f645a4d025fa136f557dd37e9aba611b425fbb]) * (edit) src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java > protocol-selenium requires Content-Type header > ----------------------------------------------- > > Key: NUTCH-3001 > URL: https://issues.apache.org/jira/browse/NUTCH-3001 > Project: Nutch > Issue Type: Bug > Reporter: Tim Allison > Priority: Minor > > It looks like the selenium protocol requires that there be a content-type > header. > The logic seems to be: If the content type is html or xhtml, use selenium, > otherwise just grab the bytes. > However, with the current logic, if the content-type is null, nothing is > pulled. > My guess is that the logic should be : if the content type is not null and > equals html or xhtml use selenium, otherwise grab the bytes. > Right? > {noformat} > String contentType = getHeader(Response.CONTENT_TYPE); > // handle with Selenium only if content type in HTML or XHTML > if (contentType != null) { > if (contentType.contains("text/html") > || contentType.contains("application/xhtml")) { > readPlainContent(url); > } else { > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)