[ https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714510#comment-16714510 ]
Sebastian Nagel commented on NUTCH-2676: ---------------------------------------- [~virt], thanks for the update. There is already an option to [white list hosts|https://wiki.apache.org/nutch/WhiteListRobots/] (NUTCH-1927). After a longer discussion we agreed on this - it makes it easy to ignore the robots.txt for a list of hosts you're allowed to but still would require a change in the source code if anybody wants to generally ignore the robots.txt standard. It's implemented in lib-http and should be available for protocol-selenium as well (but I never tested it here). > Update to the latest selenium and add code to use chrome and firefox headless > mode with the remote web driver > ------------------------------------------------------------------------------------------------------------- > > Key: NUTCH-2676 > URL: https://issues.apache.org/jira/browse/NUTCH-2676 > Project: Nutch > Issue Type: New Feature > Components: protocol > Affects Versions: 1.15 > Reporter: Stas Batururimi > Priority: Major > Fix For: 1.16 > > Attachments: Screenshot 2018-11-16 at 18.15.52.png > > > * Selenium needs to be updated > * missing remote web driver for chrome > * necessity to add headless mode for both remote WebDriverBase Firefox & > Chrome > * use case with Selenium grid using docker (1 hub docker container, several > nodes in different docker containers, Nutch in another docker container, > streaming to Apache Solr in docker container, that is at least 4 different > docker containers) -- This message was sent by Atlassian JIRA (v7.6.3#76005)