Currently, Samza works with streaming input sources like Kafka topics. This proposal will build an idea of 'end-of-stream' into Samza to support data sources that are bounded - like HDFS files, snapshots on disk etc.
Proposal: https://issues.apache.org/jira/secure/attachment/12825119/ProposalforEndofStreaminSamza.pdf This is tracked in SAMZA-974. -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University