Yeah they can. Our production setup is something like 50+ shard server in a single cluster, there are 2 production clusters (so over 100 shard servers). Each server is 96 Gig of ram and 12 hyper-threaded cores. Each shard server runs a blur.shard.server.thread.count of 128 threads (I think). We also have 4 controllers that each have the settings I discussed earlier ( blur.controller.server.thrift.thread.count=128 and blur.controller.server.remote.thread.count=2000). So I suppose that we could in theory have up to ~150 concurrent queries without blocking, assuming and even distribution of queries across the controllers.
Aaron On Thu, Feb 14, 2013 at 9:27 PM, Garrett Barton <[email protected]>wrote: > After reading what Aaron said I'm picturing something like: > blur.controller.server.thrift.thread.count= (num of concurrent queries) > blur.controller.server.remote.thread.count= > (blur.controller.server.thrift.thread.count*num > shards) > blur.shard.server.thread.count = > (blur.controller.server.remote.thread.count*num > controllers) > > I didn't really know what they were so I had rounded everything to 100, I > think the last 2 need to be much higher now. Though these numbers could > get huge with a decent sized cluster. What you think Aaron? > > > On Thu, Feb 14, 2013 at 8:09 PM, Tim Williams <[email protected]> > wrote: > > > Thanks I think that helps.. i assume there's a relationship between > > these and blur.shard.server.thrift.thread.count? Maybe in practice > > blur.shard.server.thread.count should at least be equivalent to the > > controller incoming thrift thread limit? > > > > Thanks, > > --tim > > > > On Thu, Feb 14, 2013 at 9:26 AM, Aaron McCurry <[email protected]> > wrote: > > > My guess is that it's thread starvation in the controllers. See file: > > > > > > > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-util/src/main/resources/blur-default.properties;h=50900056a7507528f1f71d645ce84d5246f6892b;hb=b89d456411e0a184dee1a63709ba7c175ec4dcef > > > > > > blur.controller.server.thrift.thread.count=32 > > > > > > The number of thrift requests that the controller can handle, meaning a > > > single query will use just one of these. We run 128 in production on > > this > > > setting. > > > > > > blur.controller.server.remote.thread.count=64 > > > > > > The number of remote calls to shard servers, meaning if you have 32 > shard > > > servers a single query will use 32 of these threads. We run 2000 in > > > production on this setting. > > > > > > Aaron > > > > > > > > > On Thu, Feb 14, 2013 at 9:17 AM, Tim Williams <[email protected]> > > wrote: > > > > > >> When an evil query (e.g. leading wildcard) are received, the > > >> controllers become unresponsive until the query is either killed or > > >> finished. Killing it is actually very difficult without responsive > > >> controllers:( The odd things is, the controller server itself doesn't > > >> seem to be under much load during that time. Anyone seen this before? > > >> > > >> Thanks, > > >> --tim > > >> > > >
