This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git
The following commit(s) were added to refs/heads/master by this push: new 383aeca5d NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit 383aeca5d is described below commit 383aeca5d30342b29b6ee6e05f8f3052c62d7303 Author: Kamil Mroczek <kamil.mroc...@gmail.com> AuthorDate: Thu Jan 19 23:05:05 2023 -0500 NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit - Removed phantomJS dependency as it wasn't being used and the project has been archived since 2018 - it was causing problems casting TakeScreenshot to HtmlUnitWebDriver - Improved README setup instructions for IntelliJ --- README.md | 44 ++++- src/plugin/lib-htmlunit/ivy.xml | 12 +- src/plugin/lib-htmlunit/plugin.xml | 214 ++++++++++++++++----- src/plugin/lib-selenium/ivy.xml | 7 +- src/plugin/lib-selenium/plugin.xml | 170 ++++++++++++++-- .../nutch/protocol/selenium/HttpWebClient.java | 28 --- .../handlers/DefaultClickAllAjaxLinksHandler.java | 7 +- 7 files changed, 361 insertions(+), 121 deletions(-) diff --git a/README.md b/README.md index a0ab67bd1..ffd04ae22 100644 --- a/README.md +++ b/README.md @@ -40,6 +40,8 @@ To contribute a patch, follow these instructions (note that installing IDE setup ========= +### Eclipse + Generate Eclipse project files ``` @@ -48,13 +50,45 @@ ant eclipse and follow the instructions in [Importing existing projects](https://help.eclipse.org/2019-06/topic/org.eclipse.platform.doc.user/tasks/tasks-importproject.htm). -For Intellij IDEA, first install the [IvyIDEA Plugin](https://plugins.jetbrains.com/plugin/3612-ivyidea). then run ```ant eclipse```. +You must [configure the nutch-site.xml](https://cwiki.apache.org/confluence/display/NUTCH/RunNutchInEclipse) before running. Make sure, you've added ```http.agent.name``` and ```plugin.folders``` properties. The plugin.folders normally points to ```<project_root>/build/plugins```. + +Now create a Java Application Configuration, choose org.apache.nutch.crawl.Injector, add two paths as arguments. First one is the crawldb directory, second one is the URL directory where, the injector can read urls. Now run your configuration. -Then open the project in IntelliJ. You may see popups like "Ant build scripts found", "Frameworks detected - IvyIDEA Framework detected". Just follow the simple steps in these dialogs. +If we still see the ```No plugins found on paths of property plugin.folders="plugins"```, update the plugin.folders in the nutch-default.xml, this is a quick fix, but should not be used. -You must [configure the nutch-site.xml](https://cwiki.apache.org/confluence/display/NUTCH/RunNutchInEclipse) before running. Make sure, you've added ```http.agent.name``` and ```plugin.folders``` properties. The plugin.folders normally points to ```<project_root>/build/plugins```. -Now create a Java Application Configuration, choose org.apache.nutch.crawl.Injector, add two paths as arguments. First one is the crawldb directory, second one is the URL directory where, the injector can read urls. Now run your configuration. +### Intellij IDEA -If we still see the ```No plugins found on paths of property plugin.folders="plugins"```, update the plugin.folders in the nutch-default.xml, this is a quick fix, but should not be used. +First install the [IvyIDEA Plugin](https://plugins.jetbrains.com/plugin/3612-ivyidea). then run ```ant eclipse```. This will create the necessary +.classpath and .project files so that Intellij can import the project in the next step. + +In Intellij IDEA, select File > New > Project from Existing Sources. Select the nutch home directory and click "Open". + +On the "Import Project" screen select the "Import project from external model" radio button and select "Eclipse". +Click "Create". On the next screen the "Eclipse projects directory" should be already set to the nutch folder. +Leave the "Create module files near .classpath files" radio button selected. +Click "Next" on the next screens. On the project SDK screen select Java 11 and click "Create". + +Once the project is imported, you will see a popup saying "Ant build scripts found", "Frameworks detected - IvyIDEA Framework detected". Click "Import". +If you don't get the pop-up, I'd suggest going through the steps again as this happens from time to time. There is another +Ant popup that asks you to configure the project. Do NOT click "Configure". + +To import the code-style, Go to Intellij IDEA > Preferences > Editor > Code Style > Java. + +For the Scheme dropdown select "Project". Click the gear icon and select "Import Scheme" > "Eclipse XML file". + +Select the eclipse-format.xml file and click "Open". On next screen check the "Current Scheme" checkbox and hit OK. + +### Running in Intellij IDEA + +Running in Intellij + +- Open Run/Debug Configurations +- Select "+" to create a new configuration and select "Application" +- For "Main Class" enter a class with a main function (e.g. org.apache.nutch.indexer.IndexingJob). +- For "Program Arguments" add the arguments needed for the class. You can get these by running the crawl executable for your job. Use full-qualified paths. (e.g. /Users/kamil/workspace/external/nutch/crawl/crawldb /Users/kamil/workspace/external/nutch/crawl/segments/20221222160141 -deleteGone) +- For "Working Directory" enter "/Users/kamil/workspace/external/nutch/runtime/local". +- Select "Modify options" > "Modify Classpath" and add the config directory belonging to the "Working Directory" from the previous step (e.g. /Users/kamil/workspace/external/nutch/runtime/local/conf). This will allow the resource loader to load that configuration. +- Select "Modify options" > "Add VM Options". Add the VM options needed. You can get these by running the crawl executable for your job (e.g. -Xmx4096m -Dhadoop.log.dir=/Users/kamil/workspace/external/nutch/runtime/local/logs -Dhadoop.log.file=hadoop.log -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true) +**Note**: You will need to manually trigger a build through ANT to get latest updated changes when running. This is because the ant build system is separate from the Intellij one. diff --git a/src/plugin/lib-htmlunit/ivy.xml b/src/plugin/lib-htmlunit/ivy.xml index 981774f28..b03211667 100644 --- a/src/plugin/lib-htmlunit/ivy.xml +++ b/src/plugin/lib-htmlunit/ivy.xml @@ -37,16 +37,8 @@ <dependencies> <!-- begin selenium dependencies --> - <dependency org="org.seleniumhq.selenium" name="selenium-java" rev="3.141.59" /> - <dependency org="org.seleniumhq.selenium" name="htmlunit-driver" rev="2.35.1" /> - - <dependency org="com.opera" name="operadriver" rev="1.5"> - <exclude org="org.seleniumhq.selenium" name="selenium-remote-driver" /> - </dependency> - <dependency org="com.codeborne" name="phantomjsdriver" rev="1.2.1" > - <exclude org="org.seleniumhq.selenium" name="selenium-remote-driver" /> - <exclude org="org.seleniumhq.selenium" name="selenium-java" /> - </dependency> + <dependency org="org.seleniumhq.selenium" name="selenium-java" rev="4.7.2" /> + <dependency org="org.seleniumhq.selenium" name="htmlunit-driver" rev="4.7.0" /> <!-- end selenium dependencies --> </dependencies> diff --git a/src/plugin/lib-htmlunit/plugin.xml b/src/plugin/lib-htmlunit/plugin.xml index bdfed927d..95caaa320 100644 --- a/src/plugin/lib-htmlunit/plugin.xml +++ b/src/plugin/lib-htmlunit/plugin.xml @@ -29,157 +29,271 @@ <export name="*"/> </library> <!-- all classes from dependent libraries are exported --> - <library name="animal-sniffer-annotations-1.14.jar"> + <library name="async-http-client-2.12.3.jar"> <export name="*"/> </library> - <library name="byte-buddy-1.8.15.jar"> + <library name="async-http-client-netty-utils-2.12.3.jar"> <export name="*"/> </library> - <library name="checker-compat-qual-2.0.0.jar"> + <library name="auto-common-1.2.jar"> <export name="*"/> </library> - <library name="commons-codec-1.11.jar"> + <library name="auto-service-1.0.1.jar"> <export name="*"/> </library> - <library name="commons-exec-1.3.jar"> + <library name="auto-service-annotations-1.0.1.jar"> + <export name="*"/> + </library> + <library name="byte-buddy-1.12.18.jar"> + <export name="*"/> + </library> + <library name="checker-qual-3.12.0.jar"> <export name="*"/> </library> - <library name="commons-io-2.6.jar"> + <library name="commons-codec-1.15.jar"> <export name="*"/> </library> - <library name="commons-jxpath-1.3.jar"> + <library name="commons-exec-1.3.jar"> + <export name="*"/> + </library> + <library name="commons-io-2.10.0.jar"> <export name="*"/> </library> - <library name="commons-lang3-3.9.jar"> + <library name="commons-lang3-3.12.0.jar"> <export name="*"/> </library> <library name="commons-logging-1.2.jar"> <export name="*"/> </library> - <library name="commons-net-3.6.jar"> + <library name="commons-net-3.8.0.jar"> + <export name="*"/> + </library> + <library name="commons-text-1.10.0.jar"> + <export name="*"/> + </library> + <library name="dec-0.1.2.jar"> + <export name="*"/> + </library> + <library name="error_prone_annotations-2.11.0.jar"> + <export name="*"/> + </library> + <library name="failsafe-3.3.0.jar"> + <export name="*"/> + </library> + <library name="failureaccess-1.0.1.jar"> + <export name="*"/> + </library> + <library name="guava-31.1-jre.jar"> + <export name="*"/> + </library> + <library name="htmlunit-2.67.0.jar"> + <export name="*"/> + </library> + <library name="htmlunit-core-js-2.67.0.jar"> + <export name="*"/> + </library> + <library name="htmlunit-cssparser-1.12.0.jar"> + <export name="*"/> + </library> + <library name="htmlunit-driver-4.7.0.jar"> + <export name="*"/> + </library> + <library name="htmlunit-xpath-2.67.0.jar"> + <export name="*"/> + </library> + <library name="httpclient-4.5.13.jar"> + <export name="*"/> + </library> + <library name="httpcore-4.4.13.jar"> + <export name="*"/> + </library> + <library name="httpmime-4.5.13.jar"> + <export name="*"/> + </library> + <library name="j2objc-annotations-1.3.jar"> + <export name="*"/> + </library> + <library name="jakarta.activation-1.2.2.jar"> + <export name="*"/> + </library> + <library name="jcommander-1.82.jar"> + <export name="*"/> + </library> + <library name="jetty-client-9.4.49.v20220914.jar"> + <export name="*"/> + </library> + <library name="jetty-http-9.4.49.v20220914.jar"> + <export name="*"/> + </library> + <library name="jetty-io-9.4.49.v20220914.jar"> + <export name="*"/> + </library> + <library name="jetty-util-9.4.49.v20220914.jar"> + <export name="*"/> + </library> + <library name="jsr305-3.0.2.jar"> + <export name="*"/> + </library> + <library name="jtoml-2.0.0.jar"> + <export name="*"/> + </library> + <library name="listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar"> + <export name="*"/> + </library> + <library name="neko-htmlunit-2.67.0.jar"> + <export name="*"/> + </library> + <library name="netty-buffer-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-codec-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-codec-http-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-codec-socks-4.1.60.Final.jar"> + <export name="*"/> + </library> + <library name="netty-common-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-handler-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-handler-proxy-4.1.60.Final.jar"> + <export name="*"/> + </library> + <library name="netty-reactive-streams-2.0.4.jar"> + <export name="*"/> + </library> + <library name="netty-resolver-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-transport-4.1.84.Final.jar"> <export name="*"/> </library> - <library name="commons-text-1.6.jar"> + <library name="netty-transport-classes-epoll-4.1.84.Final.jar"> <export name="*"/> </library> - <library name="error_prone_annotations-2.1.3.jar"> + <library name="netty-transport-classes-kqueue-4.1.84.Final.jar"> <export name="*"/> </library> - <library name="guava-25.0-jre.jar"> + <library name="netty-transport-native-epoll-4.1.84.Final.jar"> <export name="*"/> </library> - <library name="htmlunit-2.35.0.jar"> + <library name="netty-transport-native-kqueue-4.1.84.Final.jar"> <export name="*"/> </library> - <library name="htmlunit-core-js-2.35.0.jar"> + <library name="netty-transport-native-unix-common-4.1.84.Final.jar"> <export name="*"/> </library> - <library name="htmlunit-cssparser-1.4.0.jar"> + <library name="opentelemetry-api-1.19.0.jar"> <export name="*"/> </library> - <library name="htmlunit-driver-2.35.1.jar"> + <library name="opentelemetry-api-logs-1.19.0-alpha.jar"> <export name="*"/> </library> - <library name="httpclient-4.5.8.jar"> + <library name="opentelemetry-context-1.19.0.jar"> <export name="*"/> </library> - <library name="httpcore-4.4.11.jar"> + <library name="opentelemetry-exporter-common-1.19.0.jar"> <export name="*"/> </library> - <library name="httpmime-4.5.8.jar"> + <library name="opentelemetry-exporter-logging-1.19.0.jar"> <export name="*"/> </library> - <library name="ini4j-0.5.2.jar"> + <library name="opentelemetry-sdk-1.19.0.jar"> <export name="*"/> </library> - <library name="j2objc-annotations-1.1.jar"> + <library name="opentelemetry-sdk-common-1.19.0.jar"> <export name="*"/> </library> - <library name="jetty-client-9.4.16.v20190411.jar"> + <library name="opentelemetry-sdk-extension-autoconfigure-1.19.0-alpha.jar"> <export name="*"/> </library> - <library name="jetty-http-9.4.16.v20190411.jar"> + <library name="opentelemetry-sdk-extension-autoconfigure-spi-1.19.0.jar"> <export name="*"/> </library> - <library name="jetty-io-9.4.16.v20190411.jar"> + <library name="opentelemetry-sdk-logs-1.19.0-alpha.jar"> <export name="*"/> </library> - <library name="jetty-util-9.4.16.v20190411.jar"> + <library name="opentelemetry-sdk-metrics-1.19.0.jar"> <export name="*"/> </library> - <library name="jetty-xml-9.4.16.v20190411.jar"> + <library name="opentelemetry-sdk-trace-1.19.0.jar"> <export name="*"/> </library> - <library name="jsr305-1.3.9.jar"> + <library name="opentelemetry-semconv-1.19.0-alpha.jar"> <export name="*"/> </library> - <library name="neko-htmlunit-2.35.0.jar"> + <library name="reactive-streams-1.0.3.jar"> <export name="*"/> </library> - <library name="okhttp-3.11.0.jar"> + <library name="salvation2-3.0.1.jar"> <export name="*"/> </library> - <library name="okio-1.14.0.jar"> + <library name="selenium-api-4.7.2.jar"> <export name="*"/> </library> - <library name="operadriver-1.5.jar"> + <library name="selenium-chrome-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="operalaunchers-1.1.jar"> + <library name="selenium-chromium-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="phantomjsdriver-1.2.1.jar"> + <library name="selenium-devtools-v106-4.7.2.jar"> <export name="*"/> </library> - <library name="protobuf-java-2.4.1.jar"> + <library name="selenium-devtools-v107-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-api-3.141.59.jar"> + <library name="selenium-devtools-v108-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-chrome-driver-3.141.59.jar"> + <library name="selenium-devtools-v85-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-edge-driver-3.141.59.jar"> + <library name="selenium-edge-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-firefox-driver-3.141.59.jar"> + <library name="selenium-firefox-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-ie-driver-3.141.59.jar"> + <library name="selenium-http-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-java-3.141.59.jar"> + <library name="selenium-ie-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-opera-driver-3.141.59.jar"> + <library name="selenium-java-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-remote-driver-3.141.59.jar"> + <library name="selenium-json-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-safari-driver-3.141.59.jar"> + <library name="selenium-manager-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-support-3.141.59.jar"> + <library name="selenium-remote-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="serializer-2.7.2.jar"> + <library name="selenium-safari-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="websocket-api-9.4.16.v20190411.jar"> + <library name="selenium-support-4.7.2.jar"> <export name="*"/> </library> - <library name="websocket-client-9.4.16.v20190411.jar"> + <library name="websocket-api-9.4.49.v20220914.jar"> <export name="*"/> </library> - <library name="websocket-common-9.4.16.v20190411.jar"> + <library name="websocket-client-9.4.49.v20220914.jar"> <export name="*"/> </library> - <library name="xalan-2.7.2.jar"> + <library name="websocket-common-9.4.49.v20220914.jar"> <export name="*"/> </library> - <library name="xercesImpl-2.12.0.jar"> + <library name="xercesImpl-2.12.2.jar"> <export name="*"/> </library> <library name="xml-apis-1.4.01.jar"> diff --git a/src/plugin/lib-selenium/ivy.xml b/src/plugin/lib-selenium/ivy.xml index 3004ed6d6..7d3a2d624 100644 --- a/src/plugin/lib-selenium/ivy.xml +++ b/src/plugin/lib-selenium/ivy.xml @@ -37,12 +37,7 @@ <dependencies> <!-- begin selenium dependencies --> - <dependency org="org.seleniumhq.selenium" name="selenium-java" rev="3.141.5" /> - <!-- - <dependency org="com.opera" name="operadriver" rev="1.5"> - <exclude org="org.seleniumhq.selenium" name="selenium-remote-driver" /> - </dependency> - --> + <dependency org="org.seleniumhq.selenium" name="selenium-java" rev="4.7.2" /> <!-- end selenium dependencies --> </dependencies> diff --git a/src/plugin/lib-selenium/plugin.xml b/src/plugin/lib-selenium/plugin.xml index bf50ca0a8..9ec85964f 100644 --- a/src/plugin/lib-selenium/plugin.xml +++ b/src/plugin/lib-selenium/plugin.xml @@ -29,64 +29,196 @@ <export name="*"/> </library> <!-- all classes from dependent libraries are exported --> - <library name="animal-sniffer-annotations-1.14.jar"> + <library name="async-http-client-2.12.3.jar"> <export name="*"/> </library> - <library name="byte-buddy-1.8.15.jar"> + <library name="async-http-client-netty-utils-2.12.3.jar"> <export name="*"/> </library> - <library name="checker-compat-qual-2.0.0.jar"> + <library name="auto-common-1.2.jar"> + <export name="*"/> + </library> + <library name="auto-service-1.0.1.jar"> + <export name="*"/> + </library> + <library name="auto-service-annotations-1.0.1.jar"> + <export name="*"/> + </library> + <library name="byte-buddy-1.12.18.jar"> + <export name="*"/> + </library> + <library name="checker-qual-3.12.0.jar"> <export name="*"/> </library> <library name="commons-exec-1.3.jar"> <export name="*"/> </library> - <library name="error_prone_annotations-2.1.3.jar"> + <library name="error_prone_annotations-2.11.0.jar"> + <export name="*"/> + </library> + <library name="failsafe-3.3.0.jar"> + <export name="*"/> + </library> + <library name="failureaccess-1.0.1.jar"> + <export name="*"/> + </library> + <library name="guava-31.1-jre.jar"> + <export name="*"/> + </library> + <library name="j2objc-annotations-1.3.jar"> + <export name="*"/> + </library> + <library name="jakarta.activation-1.2.2.jar"> + <export name="*"/> + </library> + <library name="jcommander-1.82.jar"> + <export name="*"/> + </library> + <library name="jsr305-3.0.2.jar"> + <export name="*"/> + </library> + <library name="jtoml-2.0.0.jar"> + <export name="*"/> + </library> + <library name="listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar"> + <export name="*"/> + </library> + <library name="netty-buffer-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-codec-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-codec-http-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-codec-socks-4.1.60.Final.jar"> + <export name="*"/> + </library> + <library name="netty-common-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-handler-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-handler-proxy-4.1.60.Final.jar"> + <export name="*"/> + </library> + <library name="netty-reactive-streams-2.0.4.jar"> + <export name="*"/> + </library> + <library name="netty-resolver-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-transport-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-transport-classes-epoll-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-transport-classes-kqueue-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-transport-native-epoll-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-transport-native-kqueue-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="netty-transport-native-unix-common-4.1.84.Final.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-api-1.19.0.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-api-logs-1.19.0-alpha.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-context-1.19.0.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-exporter-common-1.19.0.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-exporter-logging-1.19.0.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-sdk-1.19.0.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-sdk-common-1.19.0.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-sdk-extension-autoconfigure-1.19.0-alpha.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-sdk-extension-autoconfigure-spi-1.19.0.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-sdk-logs-1.19.0-alpha.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-sdk-metrics-1.19.0.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-sdk-trace-1.19.0.jar"> + <export name="*"/> + </library> + <library name="opentelemetry-semconv-1.19.0-alpha.jar"> + <export name="*"/> + </library> + <library name="reactive-streams-1.0.3.jar"> + <export name="*"/> + </library> + <library name="selenium-api-4.7.2.jar"> + <export name="*"/> + </library> + <library name="selenium-chrome-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="guava-25.0-jre.jar"> + <library name="selenium-chromium-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="j2objc-annotations-1.1.jar"> + <library name="selenium-devtools-v106-4.7.2.jar"> <export name="*"/> </library> - <library name="jsr305-1.3.9.jar"> + <library name="selenium-devtools-v107-4.7.2.jar"> <export name="*"/> </library> - <library name="okhttp-3.11.0.jar"> + <library name="selenium-devtools-v108-4.7.2.jar"> <export name="*"/> </library> - <library name="okio-1.14.0.jar"> + <library name="selenium-devtools-v85-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-api-3.141.5.jar"> + <library name="selenium-edge-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-chrome-driver-3.141.5.jar"> + <library name="selenium-firefox-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-edge-driver-3.141.5.jar"> + <library name="selenium-http-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-firefox-driver-3.141.5.jar"> + <library name="selenium-ie-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-ie-driver-3.141.5.jar"> + <library name="selenium-java-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-java-3.141.5.jar"> + <library name="selenium-json-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-opera-driver-3.141.5.jar"> + <library name="selenium-manager-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-remote-driver-3.141.5.jar"> + <library name="selenium-remote-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-safari-driver-3.141.5.jar"> + <library name="selenium-safari-driver-4.7.2.jar"> <export name="*"/> </library> - <library name="selenium-support-3.141.5.jar"> + <library name="selenium-support-4.7.2.jar"> <export name="*"/> </library> </runtime> diff --git a/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java b/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java index 6af20b03d..4b998d1bc 100644 --- a/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java +++ b/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java @@ -59,10 +59,6 @@ import org.openqa.selenium.remote.RemoteWebDriver; import org.slf4j.Logger; import org.slf4j.LoggerFactory; -import org.openqa.selenium.opera.OperaOptions; -import org.openqa.selenium.opera.OperaDriver; -//import com.opera.core.systems.OperaDriver; - public class HttpWebClient { private static final Logger LOG = LoggerFactory @@ -88,13 +84,6 @@ public class HttpWebClient { "/root/chromedriver"); driver = createChromeWebDriver(chromeDriverPath, enableHeadlessMode); break; - // case "opera": - // // This class is provided as a convenience for easily testing the - // Chrome browser. - // String operaDriverPath = conf.get("selenium.grid.binary", - // "/root/operadriver"); - // driver = createOperaWebDriver(operaDriverPath, enableHeadlessMode); - // break; case "remote": String seleniumHubHost = conf.get("selenium.hub.host", "localhost"); int seleniumHubPort = Integer @@ -183,23 +172,6 @@ public class HttpWebClient { return driver; } - public static WebDriver createOperaWebDriver(String operaDriverPath, - boolean enableHeadlessMode) { - // if not specified, WebDriver will search your path for operadriver - System.setProperty("webdriver.opera.driver", operaDriverPath); - OperaOptions operaOptions = new OperaOptions(); - // operaOptions.setBinary("/usr/bin/opera"); - operaOptions.addArguments("--no-sandbox"); - operaOptions.addArguments("--disable-extensions"); - // be sure to set selenium.enable.headless to true if no monitor attached - // to your server - if (enableHeadlessMode) { - operaOptions.addArguments("--headless"); - } - WebDriver driver = new OperaDriver(operaOptions); - return driver; - } - public static RemoteWebDriver createFirefoxRemoteWebDriver(URL seleniumHubUrl, boolean enableHeadlessMode) { FirefoxOptions firefoxOptions = new FirefoxOptions(); diff --git a/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/DefaultClickAllAjaxLinksHandler.java b/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/DefaultClickAllAjaxLinksHandler.java index a4b3761ee..4f05acd3d 100644 --- a/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/DefaultClickAllAjaxLinksHandler.java +++ b/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/DefaultClickAllAjaxLinksHandler.java @@ -18,6 +18,7 @@ package org.apache.nutch.protocol.interactiveselenium.handlers; import java.lang.invoke.MethodHandles; import java.util.List; +import java.time.Duration; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.util.StringUtils; @@ -47,7 +48,7 @@ public class DefaultClickAllAjaxLinksHandler implements InteractiveSeleniumHandl driver.findElement(By.tagName("body")).getAttribute("innerHTML"); Configuration conf = NutchConfiguration.create(); - new WebDriverWait(driver, conf.getLong("libselenium.page.load.delay", 3)); + new WebDriverWait(driver, Duration.ofSeconds(conf.getLong("libselenium.page.load.delay", 3))); List<WebElement> atags = driver.findElements(By.tagName("a")); int numberofajaxlinks = atags.size(); @@ -72,8 +73,8 @@ public class DefaultClickAllAjaxLinksHandler implements InteractiveSeleniumHandl // refreshing the handlers as the page was interacted with driver.navigate().refresh(); - new WebDriverWait(driver, conf.getLong("libselenium.page.load.delay", - 3)); + new WebDriverWait(driver, Duration.ofSeconds( + conf.getLong("libselenium.page.load.delay", 3))); atags = driver.findElements(By.tagName("a")); } }