Yes is does (or can) at high load. So at this point I would recommend that users run a single small (like 1G heap) controller on every shard server to spread the effect out over a larger number of servers. Other things that can be done is increase the buffer size on NIC cards of every node (this can help but doesn't fix the problem). Also as you would expect as the shard cluster size increases the potential for Incast to become a problem increases as well (more shard servers responding). So to help with this problem I have begun thinking about the next layer in Blur to create even larger clusters. At this point I have tested and run Blur in 100+ server clusters, but due to the architecture of search the fanout has it's limits. So if we were to create another layer (optional) to act as a super controller to the controllers for each shard cluster that could help with fanout. So if we think the limit for fanout is around 100 servers and we add another layer to the fanout then we could run 10,000 node clusters.
Of course this is just an idea at this point but I think it could be the next step to server grow. Aaron On Sat, Feb 22, 2014 at 6:31 PM, rahul challapalli < [email protected]> wrote: > Hi All, > > I was just reading about TCP Incast and started to wonder if blur could > also be similarly affected. When someone wants to view the top 100 search > results, the controller might as well receive the 100 results(though not > actual data) simultaneously from all shard servers resulting in a similar > situation. Are we already handling this somewhere? Any thoughts are > appreciated. Thank You. > > - Rahul >
