himanshug edited a comment on issue #6176: CuratorInventoryManager may not 
report inventory as initialized
URL: 
https://github.com/apache/incubator-druid/issues/6176#issuecomment-414780324
 
 
   @jihoonson 
   My test environment had 3 brokers, 2 coordinators, 2 overlords, ~40 Middle 
Managers (each running about 6 kafka indexing tasks created by kafka 
supervisor), about 15-20 Historicals.
   
   some background and information is noted in 
https://groups.google.com/forum/#!msg/druid-development/eIWDPfhpM_U/AzMRxSQGAgAJ
   but, there are 3 completely  _independent_ things here...
   1) switching coordinator to use HTTP (using `HttpLoadQueuePeon`) for segment 
assignment (load/drop)
   2) switching broker/coordinator to use HTTP (using 
`HttpServerInventoryView`) for discovering what segments are served by 
queryable nodes (historicals, and peons doing indexing)
   3) switching overlord to use HTTP for task mgmt (using 
`HttpRemoteTaskRunner`)
   
   In my comment above I was talking about trying making (1) and (2) default 
after a bit of testing on some more clusters that you have.
   
   looks like #6201 pertains to (3) , so let us not consider enabling (3) by 
default at this time until we get to the bottom of #6201 .
   
   However, after (1), (2) and (3) are done with druid clusters using HTTP . 
And, we remove coordinator/overlord service announcement that is always done in 
ZK, to support tranquility.
   Then , technically, it becomes possible to write extensions for discovery 
that don't necessarily use zookeeper and use say etcd instead. However, this is 
also an independent activity which will take its own time, so don't want to 
make it a prerequisite for trying out http or default to it as we gain more 
confidence with those features. And, remove zookeeper code in phases that is 
not needed (i.e. after say 4-6 months from a release where specific thing was 
made default)
   
   each of (1), (2) lead to one additional connection per broker/coordinator to 
each queryable node.
   (3) leads to one additional connection per overlord to each MiddleManager 
node.
   
   On broker/coordinator/overlord side, `EscalatedGlobal httpClient` is used 
for making requests, so connections from their pools are used, no new 
connection pools are created.
   
   > One thing I'm concerned is the increasing HTTP connections.
   
   theoretically, it should be OK and so far testing above , I haven't seen any 
connections issue popping up due to these features. but, concern is valid and 
we can be more confident only as we roll it on more clusters.
   
   > On the other day, I could see Kafka indexing service was using too many 
HTTP connections compared to the number of worker threads even though the 
cluster was not using HTTP-based orverlords or coordinators. The number of HTTP 
connections was a few thousand which is not so high, but I'm not sure what is 
the proper default configuration for the number of worker threads.
   
   I am assuming you meant overlord http client [worker threads] had thousands 
of outbound open connections.
   for `EscalatedGlobal` client used by KIS as well, number of connections are 
set at 
https://github.com/apache/incubator-druid/blob/master/server/src/main/java/io/druid/guice/http/HttpClientModule.java#L140
 (default value is 20 ).
   So, at overlord, from that httpClient, maximum possible connections = 20 (or 
whatever is configured) X (number of KIS task peons, and any other processes 
that overlord could talk to using this client over HTTP)
   from 
https://github.com/apache/incubator-druid/blob/master/server/src/main/java/io/druid/initialization/Initialization.java#L377
 , I see there are at least 3 other HttpClient instances created with their own 
connection pools, so see if those are using the connections.
   if above accounts for thousands of connections, then it is explained or else 
there is some bug in `HttpClient` code and it creates more connections than it 
is told to.
   It would be good if you take a look at what host:port those connections are 
going to and see if those connections numbers make sense from the expectations 
above.
   
   that said, features in (1), (2), (3) don't necessarily worsen the situation 
because we have far more http requests all around going on due to other 
features. I may be proven wrong in the end, but we wouldn't know till we try :) 
.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to