Tim Allison created NUTCH-3000:
----------------------------------
Summary: protocol-selenium returns only the body,strips off the
<head/> element
Key: NUTCH-3000
URL: https://issues.apache.org/jira/browse/NUTCH-3000
Project: Nutch
Issue Type: Bug
Components: protocol
Reporter: Tim Allison
The selenium protocol returns only the body portion of the html, which means
that neither the title nor the other page metadata in the <head/> section gets
extracted.
{noformat}
String innerHtml = driver.findElement(By.tagName("body"))
.getAttribute("innerHTML");
{noformat}
We should return the full html, no?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)