We have an application that ingests data, funnels it through one server which 
then sprays it out to a distributed cluster.  The data in the cluster is 
sharded and each shard is replicated twice in the cluster.  Any given node has 
a few dozen shards and we are thinking that we want to increase the number of 
data shards so that we will have hundreds of such shards across the cluster at 
any given time.  We need to get the data to the cluster with very low latency.  
Nodes can go down and shards can get redistributed, so we can't easily map 
between message and what nodes need it (since the consumer nodes may change 
between when the message was written and when it needs to be consumed).

We are evaluating Kafka for use in routing our incoming posts to the clustered 
servers.  It looks like Kafka supports broker-side filtering (true?), but the 
API docs are a little sparse.

Which would make more sense to implement:


1.       Write to several hundred queues and have each clustered server read 
from a few dozen (either blocking on one thread per queue or timing out and 
round-robining netweem queues).

2.       Write a much smaller number of queues and have each clustered server 
filter for the content they need.

And of course the story wouldn't be complete if there weren't other consumers 
needing to read the same data but without the strict latency requirements.

Any suggestions?

Bob Jervis | Senior Architect

[cid:image001.png@01CDA860.92B40310]<http://www.visibletechnologies.com/>
Seattle | Boston | New York | London
Phone: 425.957.6075 | Fax: 781.404.5711

Follow Visibly Intelligent Blog<http://www.visibletechnologies.com/blog/>

[cid:image002.png@01CDA860.92B40310]<http://twitter.com/visible>[cid:image003.png@01CDA860.92B40310]<http://www.facebook.com/Visible.Technologies>
 [cid:image004.png@01CDA860.92B40310] 
<http://www.linkedin.com/company/visible-technologies>

Reply via email to