[jira] [Updated] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3018: --- Description: It looks like it takes between 2x and 4x of the time to initialize the remote

[jira] [Comment Edited] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781485#comment-17781485 ] Tim Allison edited comment on NUTCH-3018 at 10/31/23 6:55 PM: -- On further

[jira] [Commented] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781485#comment-17781485 ] Tim Allison commented on NUTCH-3018: On further reflection, what the above means is that if each of

[jira] [Comment Edited] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781483#comment-17781483 ] Tim Allison edited comment on NUTCH-3018 at 10/31/23 6:46 PM: -- It looks like

[jira] [Commented] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781483#comment-17781483 ] Tim Allison commented on NUTCH-3018: It looks like we cannot create more web drivers than the

[jira] [Updated] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3018: --- Description: It looks like it takes between 2x and 4x of the time to initialize the remote

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781482#comment-17781482 ] Tim Allison commented on NUTCH-3019: Separately, I noticed that logging from Tika was not working

[jira] [Created] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-10-31 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3019: -- Summary: Upgrade to Apache Tika 2.9.1 Key: NUTCH-3019 URL: https://issues.apache.org/jira/browse/NUTCH-3019 Project: Nutch Issue Type: Task

[jira] [Resolved] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2959. Resolution: Fixed > Upgrade to Apache Tika 2.9.0 > > >

[jira] [Created] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3018: -- Summary: Consider pooling remote webdrivers for Selenium? Key: NUTCH-3018 URL: https://issues.apache.org/jira/browse/NUTCH-3018 Project: Nutch Issue Type: Task

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-10-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781302#comment-17781302 ] ASF GitHub Bot commented on NUTCH-3017: --- sebastian-nagel commented on code in PR #793: URL:

Re: [PR] [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 [nutch]

2023-10-31 Thread via GitHub
sebastian-nagel commented on code in PR #793: URL: https://github.com/apache/nutch/pull/793#discussion_r1377375552 ## src/plugin/urlfilter-fast/src/java/org/apache/nutch/urlfilter/fast/FastURLFilter.java: ## @@ -181,9 +186,23 @@ public String filter(String url) { public

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-10-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781301#comment-17781301 ] ASF GitHub Bot commented on NUTCH-3017: --- sebastian-nagel commented on code in PR #793: URL:

Re: [PR] [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 [nutch]

2023-10-31 Thread via GitHub
sebastian-nagel commented on code in PR #793: URL: https://github.com/apache/nutch/pull/793#discussion_r1377375552 ## src/plugin/urlfilter-fast/src/java/org/apache/nutch/urlfilter/fast/FastURLFilter.java: ## @@ -181,9 +186,23 @@ public String filter(String url) { public