lostluck commented on code in PR #32526:
URL: https://github.com/apache/beam/pull/32526#discussion_r1769453920


##########
sdks/go/pkg/beam/runners/prism/internal/stage.go:
##########
@@ -220,12 +248,28 @@ progress:
                                                Data: residuals,
                                        })
                                }
+
+                               // Any split means we're processing slower than 
desired, but splitting should increase

Review Comment:
   Great question.
   
   That can be answered by looking at the issue being fixed, and seeing the 
behavior.
   
   In this case, the file was being opened repeatedly and endlessly*.
   
   By splitting too quickly, we end up doing more work, serializing and 
deserializing the elements in the set of elements to be processed by the SDK. 
So we weren't letting the SDK actually get any work done 
   
   This meant that because we were slow to open the file, we opened the file, 
again and again in different bundles.
   
   *Eventually there would be nothing left to split, and lines would be 
emitted, but it would have been very wasteful.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to