[GitHub] himanshug edited a comment on issue #6176: CuratorInventoryManager may not report inventory as initialized

GitBox Tue, 21 Aug 2018 12:29:56 -0700

himanshug edited a comment on issue #6176: CuratorInventoryManager may not
report inventory as initialized
URL:
https://github.com/apache/incubator-druid/issues/6176#issuecomment-414780324

@jihoonson
My test environment had 3 brokers, 2 coordinators, 2 overlords, ~40 Middle
Managers (each running about 6 kafka indexing tasks created by kafka
supervisor), about 15-20 Historicals.

some background and information is noted in
https://groups.google.com/forum/#!msg/druid-development/eIWDPfhpM_U/AzMRxSQGAgAJ
but, there are 3 completely _independent_ things here...
1) switching coordinator to use HTTP (using `HttpLoadQueuePeon`) for segment
assignment (load/drop)
2) switching broker/coordinator to use HTTP (using
`HttpServerInventoryView`) for discovering what segments are served by
queryable nodes (historicals, and peons doing indexing)
3) switching overlord to use HTTP for task mgmt (using
`HttpRemoteTaskRunner`)

In my comment above I was talking about trying making (1) and (2) default
after a bit of testing on some more clusters that you have.

looks like #6201 pertains to (3) , so let us not consider enabling (3) by
default at this time until we get to the bottom of #6201 .

However, after (1), (2) and (3) are done with druid clusters using HTTP .
And, we remove coordinator/overlord service announcement that is always done in
ZK, to support tranquility.
Then , technically, it becomes possible to write extensions for discovery
that don't necessarily use zookeeper and use say etcd instead. However, this is
also an independent activity which will take its own time, so don't want to
make it a prerequisite for trying out http or default to it as we gain more
confidence with those features. And, remove zookeeper code in phases that is
not needed (i.e. after say 4-6 months from a release where specific thing was
made default)

each of (1), (2) lead to one additional connection per broker/coordinator to
each queryable node.
(3) leads to one additional connection per overlord to each MiddleManager
node.

On broker/coordinator/overlord side, `EscalatedGlobal httpClient` is used
for making requests, so connections from their pools are used, no new
connection pools are created.

> One thing I'm concerned is the increasing HTTP connections.

theoretically, it should be OK and so far testing above , I haven't seen any
connections issue popping up due to these features. but, concern is valid and
we can be more confident only as we roll it on more clusters.

> On the other day, I could see Kafka indexing service was using too many
HTTP connections compared to the number of worker threads even though the
cluster was not using HTTP-based orverlords or coordinators. The number of HTTP
connections was a few thousand which is not so high, but I'm not sure what is
the proper default configuration for the number of worker threads.

I am assuming you meant overlord http client [worker threads] had thousands
of outbound open connections.
for `EscalatedGlobal` client used by KIS as well, number of connections are
set at
https://github.com/apache/incubator-druid/blob/master/server/src/main/java/io/druid/guice/http/HttpClientModule.java#L140
(default value is 20 ).
So, at overlord, from that httpClient, maximum possible connections = 20 (or
whatever is configured) X (number of KIS task peons, and any other processes
that overlord could talk to using this client over HTTP)
from
https://github.com/apache/incubator-druid/blob/master/server/src/main/java/io/druid/initialization/Initialization.java#L377
, I see there are at least 3 other HttpClient instances created with their own
connection pools, so see if those are using the connections.
if above accounts for thousands of connections, then it is explained or else
there is some bug in `HttpClient` code and it creates more connections than it
is told to.
It would be good if you take a look at what host:port those connections are
going to and see if those connections numbers make sense from the expectations
above.

that said, features in (1), (2), (3) don't necessarily worsen the situation
because we have far more http requests all around going on due to other
features. I may be proven wrong in the end, but we wouldn't know till we try :)
.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] himanshug edited a comment on issue #6176: CuratorInventoryManager may not report inventory as initialized

Reply via email to