Guys is it fair to say that YARN exposes an extension mechanism called the ApplicationMaster and by default in yarn this master is a map-reduce application master.
In the case of samza we have implemented a streaming case of this AM which takes full advantage of the parrallel / fault tolerant mechanisms built into hadoop. So instead where we partition map reduce tasks bases on file size splits in hdfs, we split a stream into stream tasks based on some filter key? Is this correct. -S
