----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50143/ -----------------------------------------------------------
Review request for samza, Boris Shkolnik, Chris Pettitt, Fred Ji, Jake Maes, Yi Pan (Data Infrastructure), Navina Ramesh, and Xinyu Liu. Repository: samza Description ------- Samza currently works with unbounded data sources. However, for bounded data sources like HDFS files, snapshot files which are not infinite, we need a notion of 'end-of-stream'. The following are the logical tasks: 1. SystemConsumer will indicate to Samza that the end of stream has been reached for an SSP. (by constructing an envelope with eof set to true) 2. Samza will shut down the task if all SSPs in the task are at end of stream. 3. Samza will provide a callback to the task so that it can perform cleanups/ commits once tasks are at end of stream. 4. Samza will shut down the container if all tasks in the container have been shut down. 5. Samza will ultimately shut down the job if all containers in the job have been shut down. This is a step towards realizing a 'finite' Samza job that terminates (as opposed to an infinite stream job that keeps running) once data processing is complete. === This RB is an RFC for design feedback ==== TODO: 1. Add more unit tests 2. Verify behavior with multiple containers Diffs ----- samza-api/src/main/java/org/apache/samza/system/IncomingMessageEnvelope.java cc860cf7eb4d514736913c1dceaa80534b61d71a samza-api/src/main/java/org/apache/samza/task/EndOfStreamListenerTask.java PRE-CREATION samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala d32a92976e43ca24033b48c91851ee706de7de6b samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala e280daa9626757cb4d17c0c03eed923277230c3e Diff: https://reviews.apache.org/r/50143/diff/ Testing ------- Added an unit test and verified that an End of stream message terminates the runloop. Thanks, Jagadish Venkatraman