Re: [PR] [Dataflow Streaming] Add a config to use multiple commit threads [beam]

via GitHub Fri, 02 Feb 2024 01:01:35 -0800


scwhittle commented on code in PR #30194:
URL: https://github.com/apache/beam/pull/30194#discussion_r1475759425



##########
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java:
##########
@@ -260,6 +260,12 @@ public Dataflow create(PipelineOptions options) {
 
   void setstreamingSideInputCacheExpirationMillis(Integer value);
 
+  @Description("Number of commit threads used to commit items to streaming 
engine.")
+  @Default.Integer(1)
+  Integer getWindmillServiceCommitThreads();

Review Comment:
   You could add a test to StreamingDataflowWorkerTest.java
   If you just want to cover that it functions, you could just set the option 
higher and run something like testBasic which expects 2000 commits.
   
   If you want to more expliclitly test benefits of multiple commmit streams, 
you could modify the CommitWorkStream returned by FakeWindmillServer to support 
blocking on a large commit (simulating it being slow). Then you could have a 
test in StreamingDataflowWorkerTest where you have a large commit and smaller 
commits and verify that the smaller commits are received even while the large 
one is blocking.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Dataflow Streaming] Add a config to use multiple commit threads [beam]

Reply via email to