[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3001:
-------------------------------
Description:
It looks like the selenium protocol requires that there be a content-type
header.
The logic seems to be: If the content type is html or xhtml, use selenium,
otherwise just grab the bytes.
However, with the current logic, if the content-type is null, nothing is
pulled.
My guess is that the logic should be : if the content type is not null and
equals html or xhtml use selenium, otherwise grab the bytes.
Right?
{noformat}
String contentType = getHeader(Response.CONTENT_TYPE);
// handle with Selenium only if content type in HTML or XHTML
if (contentType != null) {
{noformat}
was:
It looks like the selenium protocol requires that there be content-type.
The logic seems to be: If the content type is html or xhtml, use selenium,
otherwise just grab the bytes.
If the content-type is null, nothing is pulled.
My guess is that the logic should be : if the content type is not null and
equals html or xhtml use selenium, otherwise grab the bytes.
Right?
{noformat}
String contentType = getHeader(Response.CONTENT_TYPE);
// handle with Selenium only if content type in HTML or XHTML
if (contentType != null) {
{noformat}
> protocol-selenium requires Content-Type header
> -----------------------------------------------
>
> Key: NUTCH-3001
> URL: https://issues.apache.org/jira/browse/NUTCH-3001
> Project: Nutch
> Issue Type: Bug
> Reporter: Tim Allison
> Priority: Major
>
> It looks like the selenium protocol requires that there be a content-type
> header.
> The logic seems to be: If the content type is html or xhtml, use selenium,
> otherwise just grab the bytes.
> However, with the current logic, if the content-type is null, nothing is
> pulled.
> My guess is that the logic should be : if the content type is not null and
> equals html or xhtml use selenium, otherwise grab the bytes.
> Right?
> {noformat}
> String contentType = getHeader(Response.CONTENT_TYPE);
> // handle with Selenium only if content type in HTML or XHTML
> if (contentType != null) {
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)