GitHub user nickpan47 opened a pull request: https://github.com/apache/samza/pull/475
SAMZA-1659: Serializable OperatorSpec This change is to make the user supplied functions serializable. Hence, making the full user defined DAG serializable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickpan47/samza serializable-opspec-only-Jan-24-18 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/475.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #475 ---- commit 5573a069e95f6467b92d73b3cba37f035e067ae2 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-06-12T04:20:36Z WIP: new api revision commit 8bb975204bd8a5ad3459642609c9e4992c495701 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-06-12T07:38:40Z WIP: proto-type of input/output stream/system specs commit aeb457303779e7f31b114ad23bfa9ebaafeddab6 Author: Xinyu Liu <xiliu@...> Date: 2017-06-29T00:16:10Z SAMZA-1321: Propagate end-of-stream and watermark messages The patch completes the end-of-stream work flow across multi-stage pipeline. It also contains initial commit for supporting watermarks. For watermark, there are issues raised in the review feedback and will be addressed by further prs. The main logic this patch adds: - EndOfStreamManager aggregates the end-of-stream control messages, propagate the result to to downstream intermediate topics based on the topology of the IO in the StreamGraph. - WatermarkManager aggregates the watermark control messages from the upstage tasks, pass it through the operators, and propagate it to downstream. In operator impl, I implemented similar watermark logic as Beam for watermark propagation: * InputWatermark(op) = min { OutputWatermark(op') | op1 is upstream of op} * OutputWatermark(op) = min { InputWatermark(op), OldestWorkTime(op) } Add quite a few unit tests and integration test. The code is 100% covered as reported by Intellij. Both control messages work as expected. Author: Xinyu Liu <xi...@xiliu-ld.linkedin.biz> Reviewers: Yi Pan <nickpa...@gmail.com> Closes #236 from xinyuiscool/SAMZA-1321 commit ae3dc6ff133fa3f0c7d9f27b6f724c652105680c Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-06-12T04:20:36Z WIP: new api revision commit 91f364f1e3e350c4c9d6413e620d9506a1da91c8 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-06-12T07:38:40Z WIP: proto-type of input/output stream/system specs commit b898e6c037e4fb4c9ddf1baee95095045e6500ab Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-06-13T01:43:50Z WIP: new-api-v2 commit cd528c1c30646a3307c81770f3dc82482bba1aba Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-06-21T16:12:41Z WIP: updated spec and user DAG API commit 0bc7ee7babd0a64ed4d7cf6b7fab24be3a3447ae Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-06-30T16:55:09Z WIP: update the user code example on new APIs commit 51541e133216b9c730f57653f5f7afb6c56d21a5 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-07-11T16:07:32Z WIP: cleanup StreamDescriptor commit 3c50629eb93b674731398f9f8c756e499da5faf2 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-07-27T22:05:07Z WIP: adding support for low-level task APIs commit 4a6a58dcb80fb2919f5bad42f6f1979960826d62 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-07-31T09:51:30Z WIP: updated w/ low-level task API and global var ingestion/metrics reporter commit 256155ad530c48af0cf60ba8945e8face0133a90 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-07-31T17:29:22Z WIP: new API user code examples commit f227380f20503734decca75c0f34aa5810b7f1c0 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-08-02T07:50:54Z WIP: update the app runner classes commit e6fb96e574aa9ee78b3beedd6921e4a680fa9c73 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-08-08T14:35:39Z WIP: merged all application types into StreamApplications commit 525d8bc1b96149299ad831d1ba3d458d6dd3e4b1 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-08-09T18:02:31Z Merge branch '0.14.0' into new-api-v2 Conflicts: samza-core/src/main/java/org/apache/samza/operators/StreamGraphImpl.java samza-core/src/main/java/org/apache/samza/operators/impl/OutputOperatorImpl.java samza-core/src/main/java/org/apache/samza/processor/StreamProcessor.java samza-core/src/main/java/org/apache/samza/runtime/LocalContainerRunner.java commit 6fc6d4c09d13132ae3c915de6bb6223206cb4ca2 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-08-25T16:49:18Z WIP: experiment code to implement an end-to-end working example for new APIs commit d7df6ed0ef3f6ca8ad5ee4cb7e7307c314a71619 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-08-31T22:41:41Z WIP: added all unit test for OperatorSpec#copy methods. commit dde1ab14bee0ff4ebbc15500a938b191295dbd90 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-09-06T16:43:37Z WIP: first end-to-end test commit 50201728e964cbe3c997b4e335df3cd2a46ed076 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-10-19T23:48:56Z Merge branch 'experiment-new-api-v2' into new-api-v2-0.14 Conflicts: samza-api/src/main/java/org/apache/samza/operators/MessageStream.java samza-api/src/main/java/org/apache/samza/operators/StreamGraph.java samza-api/src/main/java/org/apache/samza/operators/windows/Windows.java samza-api/src/main/java/org/apache/samza/operators/windows/internal/WindowInternal.java samza-api/src/main/java/org/apache/samza/serializers/Serde.java samza-api/src/main/java/org/apache/samza/system/StreamSpec.java samza-core/src/main/java/org/apache/samza/operators/MessageStreamImpl.java samza-core/src/main/java/org/apache/samza/operators/StreamGraphImpl.java samza-core/src/main/java/org/apache/samza/operators/functions/PartialJoinFunction.java samza-core/src/main/java/org/apache/samza/operators/impl/InputOperatorImpl.java samza-core/src/main/java/org/apache/samza/operators/impl/OperatorImplGraph.java samza-core/src/main/java/org/apache/samza/operators/impl/OutputOperatorImpl.java samza-core/src/main/java/org/apache/samza/operators/impl/WindowOperatorImpl.java samza-core/src/main/java/org/apache/samza/operators/spec/InputOperatorSpec.java samza-core/src/main/java/org/apache/samza/operators/spec/JoinOperatorSpec.java samza-core/src/main/java/org/apache/samza/operators/spec/OperatorSpec.java samza-core/src/main/java/org/apache/samza/operators/spec/OperatorSpecs.java samza-core/src/main/java/org/apache/samza/operators/spec/OutputOperatorSpec.java samza-core/src/main/java/org/apache/samza/operators/spec/OutputStreamImpl.java samza-core/src/main/java/org/apache/samza/operators/spec/SinkOperatorSpec.java samza-core/src/main/java/org/apache/samza/operators/spec/StreamOperatorSpec.java samza-core/src/main/java/org/apache/samza/operators/spec/WindowOperatorSpec.java samza-core/src/main/java/org/apache/samza/runtime/AbstractApplicationRunner.java samza-core/src/main/java/org/apache/samza/runtime/LocalApplicationRunner.java samza-core/src/main/java/org/apache/samza/runtime/RemoteApplicationRunner.java samza-core/src/main/java/org/apache/samza/task/StreamOperatorTask.java samza-core/src/test/java/org/apache/samza/control/TestEndOfStreamManager.java samza-core/src/test/java/org/apache/samza/control/TestIOGraph.java samza-core/src/test/java/org/apache/samza/control/TestWatermarkManager.java samza-core/src/test/java/org/apache/samza/example/BroadcastExample.java samza-core/src/test/java/org/apache/samza/example/MergeExample.java samza-core/src/test/java/org/apache/samza/execution/TestExecutionPlanner.java samza-core/src/test/java/org/apache/samza/execution/TestJobGraphJsonGenerator.java samza-core/src/test/java/org/apache/samza/operators/TestJoinOperator.java samza-core/src/test/java/org/apache/samza/operators/TestMessageStreamImpl.java samza-core/src/test/java/org/apache/samza/operators/TestStreamGraphImpl.java samza-core/src/test/java/org/apache/samza/operators/TestWindowOperator.java samza-core/src/test/java/org/apache/samza/operators/impl/TestOperatorImpl.java samza-core/src/test/java/org/apache/samza/operators/impl/TestOperatorImplGraph.java samza-core/src/test/java/org/apache/samza/operators/spec/TestWindowOperatorSpec.java samza-core/src/test/java/org/apache/samza/runtime/TestApplicationRunnerMain.java samza-core/src/test/java/org/apache/samza/runtime/TestLocalApplicationRunner.java samza-test/src/main/java/org/apache/samza/example/KeyValueStoreExample.java samza-test/src/main/java/org/apache/samza/example/OrderShipmentJoinExample.java samza-test/src/main/java/org/apache/samza/example/PageViewCounterExample.java samza-test/src/main/java/org/apache/samza/example/RepartitionExample.java samza-test/src/main/java/org/apache/samza/example/WindowExample.java samza-test/src/test/java/org/apache/samza/test/controlmessages/EndOfStreamIntegrationTest.java samza-test/src/test/java/org/apache/samza/test/controlmessages/WatermarkIntegrationTest.java samza-test/src/test/java/org/apache/samza/test/operator/RepartitionWindowApp.java samza-test/src/test/java/org/apache/samza/test/operator/SessionWindowApp.java samza-test/src/test/java/org/apache/samza/test/operator/StreamApplicationIntegrationTestHarness.java samza-test/src/test/java/org/apache/samza/test/operator/TestRepartitionWindowApp.java samza-test/src/test/java/org/apache/samza/test/operator/TumblingWindowApp.java samza-test/src/test/java/org/apache/samza/test/processor/TestZkLocalApplicationRunner.java commit bf1ce907645e4a131c5765c4612c158e1855730a Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-10-20T00:43:33Z WIP: removing StreamDescriptor first commit d46403294aecc2ebdbc3a890da8c40acfa35f448 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-10-23T16:05:48Z WIP: fixing unit tests after merge commit 6a14b2afad8d57cccb17e5926cd44743b6ebd034 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-10-30T22:25:03Z WIP: fixed unit test failure for Windows commit 475a46bced8d84c7fe09b57590d3a72e0587c3c8 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-12-15T18:59:29Z WIP: fixed TestZkLocalApplicationRunner. Debugging issues w/ TestRepartitionWindowApp (i.e. missing changelog creation step when directly running LocalApplicationRunner) commit dc7da87e2eec618bb060acfec695c5b1aa280259 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2017-12-22T18:27:21Z WIP: unit tests for serialization commit 4102aa8ced2b5630edf3ac84e0c6298956a999b5 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2018-01-02T17:36:29Z WIP: continued working on potential offspring integration commit 1670aff0cefb569b109fad3928f04190a7b4e0a1 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2018-01-10T22:20:22Z WIP: class-loading of user program logic and main() method based user program logic are both included in ThreadJobFactory/ProcessJobFactory/YarnJobFactory. ThreadJobFactory test suite to be fixed. commit 0ebebfc3ba49de6d6589cd3b82eb8ce3882d034d Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2018-01-19T23:11:26Z WIP: serialization only change commit aca423085ed29452adc89c48aee4aa1126874328 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2018-01-25T07:52:16Z WIP: Serialize OperatorSpec only w/o StreamApplication interface change. Passed all build and tests. commit b973b105ddb8cd49ea5f084ceebea19b27254f36 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2018-02-08T08:12:58Z Merge branch 'master' into serializable-opspec-only-Jan-24-18 Conflicts: samza-api/src/main/java/org/apache/samza/operators/MessageStream.java samza-core/src/main/java/org/apache/samza/operators/MessageStreamImpl.java samza-core/src/main/java/org/apache/samza/operators/StreamGraphImpl.java samza-core/src/main/java/org/apache/samza/operators/impl/OperatorImpl.java samza-core/src/main/java/org/apache/samza/operators/impl/OperatorImplGraph.java samza-core/src/main/java/org/apache/samza/operators/impl/OutputOperatorImpl.java samza-core/src/main/java/org/apache/samza/operators/impl/PartialJoinOperatorImpl.java samza-core/src/main/java/org/apache/samza/operators/impl/PartitionByOperatorImpl.java samza-core/src/main/java/org/apache/samza/operators/impl/WindowOperatorImpl.java samza-core/src/main/java/org/apache/samza/operators/spec/InputOperatorSpec.java samza-core/src/main/java/org/apache/samza/operators/spec/OperatorSpec.java samza-core/src/main/java/org/apache/samza/operators/spec/OperatorSpecs.java samza-core/src/main/java/org/apache/samza/operators/spec/OutputStreamImpl.java samza-core/src/main/java/org/apache/samza/operators/spec/PartitionByOperatorSpec.java samza-core/src/main/java/org/apache/samza/operators/spec/StreamOperatorSpec.java samza-core/src/main/java/org/apache/samza/operators/stream/IntermediateMessageStreamImpl.java samza-core/src/main/java/org/apache/samza/runtime/LocalApplicationRunner.java samza-core/src/main/scala/org/apache/samza/coordinator/JobModelManager.scala samza-core/src/test/java/org/apache/samza/execution/TestExecutionPlanner.java samza-core/src/test/java/org/apache/samza/execution/TestJobGraphJsonGenerator.java samza-core/src/test/java/org/apache/samza/operators/TestJoinOperator.java samza-core/src/test/java/org/apache/samza/operators/TestMessageStreamImpl.java samza-core/src/test/java/org/apache/samza/operators/TestStreamGraphImpl.java samza-core/src/test/java/org/apache/samza/operators/impl/TestOperatorImpl.java samza-core/src/test/java/org/apache/samza/operators/impl/TestOperatorImplGraph.java samza-core/src/test/java/org/apache/samza/operators/impl/TestWindowOperator.java samza-core/src/test/java/org/apache/samza/operators/spec/TestWindowOperatorSpec.java samza-test/src/main/java/org/apache/samza/example/KeyValueStoreExample.java samza-test/src/main/java/org/apache/samza/example/OrderShipmentJoinExample.java samza-test/src/main/java/org/apache/samza/example/PageViewCounterExample.java samza-test/src/main/java/org/apache/samza/example/RepartitionExample.java samza-test/src/main/java/org/apache/samza/example/WindowExample.java samza-test/src/test/java/org/apache/samza/test/controlmessages/EndOfStreamIntegrationTest.java samza-test/src/test/java/org/apache/samza/test/controlmessages/WatermarkIntegrationTest.java samza-test/src/test/java/org/apache/samza/test/operator/RepartitionJoinWindowApp.java samza-test/src/test/java/org/apache/samza/test/operator/SessionWindowApp.java samza-test/src/test/java/org/apache/samza/test/operator/TestRepartitionJoinWindowApp.java samza-test/src/test/java/org/apache/samza/test/operator/TumblingWindowApp.java samza-test/src/test/java/org/apache/samza/test/processor/TestZkLocalApplicationRunner.java commit 7c8d1591cef6326f32b2ffd061cc52b902e37214 Author: Yi Pan (Data Infrastructure) <nickpan47@...> Date: 2018-02-12T08:29:26Z WIP: working on unit tests for trigger, broadcast, join, table, and SQL UDF function serialization ---- ---