[ 
https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781485#comment-17781485
 ] 

Tim Allison edited comment on NUTCH-3018 at 10/31/23 6:55 PM:
--------------------------------------------------------------

On further reflection, what the above means is that if each of our threads 
creates its own web driver for every fetch, that means that the selenium 
instance is blocking the creation of these web-drivers until the current number 
of connections is less than the number of worker nodes TIMES 
SE_NODE_MAX_SESSIONS.

In short, we're already rate-limited by selenium.  We may as well rate limit 
ourselves and reuse drivers when we can?


was (Author: talli...@mitre.org):
On further reflection, what the above means is that if each of our threads 
creates its own web driver for every fetch, that means that the selenium 
instance is blocking the creation of these web-drivers until the current number 
of connections is < the number of worker nodes X SE_NODE_MAX_SESSIONS.

In short, we're already rate-limited by selenium.  We may as well rate limit 
ourselves and reuse drivers when we can?

> Consider pooling remote webdrivers for Selenium?
> ------------------------------------------------
>
>                 Key: NUTCH-3018
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3018
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> It looks like it takes between 2x and 4x of the time to initialize the remote 
> webdriver in selenium than it does to render/fetch a couple of test pages I'm 
> working with.
> On a mac with a chrome driver, ~1.5 seconds to load the driver and then .5 of 
> a second to fetch/render the page. On a mac, ~1.2 seconds to load and then .5 
> of a second to fetch/render.  
> On a mac with firefox driver, ~3.7 seconds to load the driver and ~1 second 
> to fetch/render a page.
> Is it worth pooling webdrivers or does that add too much complexity/overhead?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to