After reading what Aaron said I'm picturing something like: blur.controller.server.thrift.thread.count= (num of concurrent queries) blur.controller.server.remote.thread.count= (blur.controller.server.thrift.thread.count*num shards) blur.shard.server.thread.count = (blur.controller.server.remote.thread.count*num controllers)
I didn't really know what they were so I had rounded everything to 100, I think the last 2 need to be much higher now. Though these numbers could get huge with a decent sized cluster. What you think Aaron? On Thu, Feb 14, 2013 at 8:09 PM, Tim Williams <[email protected]> wrote: > Thanks I think that helps.. i assume there's a relationship between > these and blur.shard.server.thrift.thread.count? Maybe in practice > blur.shard.server.thread.count should at least be equivalent to the > controller incoming thrift thread limit? > > Thanks, > --tim > > On Thu, Feb 14, 2013 at 9:26 AM, Aaron McCurry <[email protected]> wrote: > > My guess is that it's thread starvation in the controllers. See file: > > > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-util/src/main/resources/blur-default.properties;h=50900056a7507528f1f71d645ce84d5246f6892b;hb=b89d456411e0a184dee1a63709ba7c175ec4dcef > > > > blur.controller.server.thrift.thread.count=32 > > > > The number of thrift requests that the controller can handle, meaning a > > single query will use just one of these. We run 128 in production on > this > > setting. > > > > blur.controller.server.remote.thread.count=64 > > > > The number of remote calls to shard servers, meaning if you have 32 shard > > servers a single query will use 32 of these threads. We run 2000 in > > production on this setting. > > > > Aaron > > > > > > On Thu, Feb 14, 2013 at 9:17 AM, Tim Williams <[email protected]> > wrote: > > > >> When an evil query (e.g. leading wildcard) are received, the > >> controllers become unresponsive until the query is either killed or > >> finished. Killing it is actually very difficult without responsive > >> controllers:( The odd things is, the controller server itself doesn't > >> seem to be under much load during that time. Anyone seen this before? > >> > >> Thanks, > >> --tim > >> >
