claudevdm commented on code in PR #38015:
URL: https://github.com/apache/beam/pull/38015#discussion_r3017319217
##########
sdks/python/apache_beam/io/gcp/bigquery_change_history.py:
##########
@@ -730,16 +738,75 @@ def _ensure_client(self) -> None:
def setup(self) -> None:
self._ensure_client()
+ def _split_all_streams(
+ self, stream_names: Tuple[str, ...],
+ max_split_rounds: int) -> Tuple[str, ...]:
+ """Split each stream at fraction=0.5 for up to max_split_rounds rounds.
+
+ Each round attempts to split every stream in the current list. A
+ successful split replaces the original stream with primary + remainder.
+ A refused split (both fields empty) keeps the original stream intact.
+ Stops when max_split_rounds is reached or a full round produces zero
+ new splits.
+
+ BQ's server-side granularity controls how many splits are possible.
+ Small tables may not split at all; large tables may allow multiple
+ rounds of doubling.
+ """
+ result = list(stream_names)
+ for round_num in range(1, max_split_rounds + 1):
+ new_result = []
+ made_progress = False
+ for name in result:
+ response = self._storage_client.split_read_stream(
+ request=bq_storage.types.SplitReadStreamRequest(
+ name=name, fraction=0.5))
Review Comment:
This wont happen and can be controlled. BQ only allows splitting a very
limted of times
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]