We are in the early evaluating period for Samza in a relatively resource 
constrained environment.  One of the things we cannot currently expect is more 
than a 1 gigabit local network which our models indicate we will saturate in a 
naïve case.

One solution we are considering would be that all of our highest throughput 
jobs, the ones that consume directly from and filter high throughput topics, 
would be co-located on the same nodes running the brokers for the applicable 
partition of those topics.  The idea being we would not have to escape loopback 
to deliver the messages and that the output bandwidth of those jobs would be 
significantly smaller and more manageable.

It seems like this is something the ApplicationMaster would have to coordinate 
with YARN and very much resembles how YARN will allocate compute resources near 
HDFS-stored-data.  Is there anything in ApplicationMaster that would allow us 
to do this today?  Or would the proper approach be to run those jobs directly 
outside of a YARN grid and have the YARN Jobs read from the products of such 
direct jobs?

-Bart


________________________________
This e-mail may contain CONFIDENTIAL AND PROPRIETARY INFORMATION and/or 
PRIVILEGED AND CONFIDENTIAL COMMUNICATION intended solely for the recipient 
and, therefore, may not be retransmitted to any party outside of the 
recipient's organization without the prior written consent of the sender. If 
you have received this e-mail in error please notify the sender immediately by 
telephone or reply e-mail and destroy the original message without making a 
copy. Deep Silver Volition, LLC accepts no liability for any losses or damages 
resulting from infected e-mail transmissions and viruses in e-mail attachment.

Reply via email to