Yeah they can.  Our production setup is something like 50+ shard server in
a single cluster, there are 2 production clusters (so over 100 shard
servers).  Each server is 96 Gig of ram and 12 hyper-threaded cores.  Each
shard server runs a blur.shard.server.thread.count of 128 threads (I think).
We also have 4 controllers that each have the settings I discussed earlier (
blur.controller.server.thrift.thread.count=128 and
blur.controller.server.remote.thread.count=2000).  So I suppose that we
could in theory have up to ~150 concurrent queries without blocking,
assuming and even distribution of queries across the controllers.

Aaron


On Thu, Feb 14, 2013 at 9:27 PM, Garrett Barton <[email protected]>wrote:

> After reading what Aaron said I'm picturing something like:
> blur.controller.server.thrift.thread.count= (num of concurrent queries)
> blur.controller.server.remote.thread.count=
> (blur.controller.server.thrift.thread.count*num
> shards)
> blur.shard.server.thread.count =
> (blur.controller.server.remote.thread.count*num
> controllers)
>
> I didn't really know what they were so I had rounded everything to 100, I
> think the last 2 need to be much higher now.  Though these numbers could
> get huge with a decent sized cluster.  What you think Aaron?
>
>
> On Thu, Feb 14, 2013 at 8:09 PM, Tim Williams <[email protected]>
> wrote:
>
> > Thanks I think that helps.. i assume there's a relationship between
> > these and blur.shard.server.thrift.thread.count? Maybe in practice
> > blur.shard.server.thread.count should at least be equivalent to the
> > controller incoming thrift thread limit?
> >
> > Thanks,
> > --tim
> >
> > On Thu, Feb 14, 2013 at 9:26 AM, Aaron McCurry <[email protected]>
> wrote:
> > > My guess is that it's thread starvation in the controllers.  See file:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-util/src/main/resources/blur-default.properties;h=50900056a7507528f1f71d645ce84d5246f6892b;hb=b89d456411e0a184dee1a63709ba7c175ec4dcef
> > >
> > > blur.controller.server.thrift.thread.count=32
> > >
> > > The number of thrift requests that the controller can handle, meaning a
> > > single query will use just one of these.  We run 128 in production on
> > this
> > > setting.
> > >
> > > blur.controller.server.remote.thread.count=64
> > >
> > > The number of remote calls to shard servers, meaning if you have 32
> shard
> > > servers a single query will use 32 of these threads.  We run 2000 in
> > > production on this setting.
> > >
> > > Aaron
> > >
> > >
> > > On Thu, Feb 14, 2013 at 9:17 AM, Tim Williams <[email protected]>
> > wrote:
> > >
> > >> When an evil query (e.g. leading wildcard) are received, the
> > >> controllers become unresponsive until the query is either killed or
> > >> finished.  Killing it is actually very difficult without responsive
> > >> controllers:(  The odd things is, the controller server itself doesn't
> > >> seem to be under much load during that time.  Anyone seen this before?
> > >>
> > >> Thanks,
> > >> --tim
> > >>
> >
>

Reply via email to