Hi Tathagata, Many thanks for the extended answer and the clarifications on the kafka data distribution in the cluster.
There are many points to handle, so, to start somewhere: Case (ii) could have been implemented as an actor as it just inserts a > > record on an arraybuffer (i.e.m very small task). However, with rates of > more than 100K records received per second, I was unsure what the overhead > of sending each record as a message through the actor library would be > like. > > I'm personally curious about this point. I could investigate by creating a simplified test scenario that isolates the data cummulator case and compare the performance of both models (actors vs threads with proper locking) under different levels of concurrency. Do you think this could be helpful for the project? I'm looking to contribute and this could be an interesting starting point. >>I probably went into more detail that you wanted to know. :) Absolutely not. The more, the better :-) -kr, Gerard.