lostluck commented on code in PR #32526:
URL: https://github.com/apache/beam/pull/32526#discussion_r1769453920
##########
sdks/go/pkg/beam/runners/prism/internal/stage.go:
##########
@@ -220,12 +248,28 @@ progress:
Data: residuals,
})
}
+
+ // Any split means we're processing slower than
desired, but splitting should increase
Review Comment:
Great question.
That can be answered by looking at the issue being fixed, and seeing the
behavior.
In this case, the file was being opened repeatedly and endlessly*.
By splitting too quickly, we end up doing more work, serializing and
deserializing the elements in the set of elements to be processed by the SDK.
So we weren't letting the SDK actually get any work done
This meant that because we were slow to open the file, we opened the file,
again and again in different bundles.
*Eventually there would be nothing left to split, and lines would be
emitted, but it would have been very wasteful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]