Eliaaazzz commented on code in PR #37532:
URL: https://github.com/apache/beam/pull/37532#discussion_r3014266043
##########
sdks/python/apache_beam/transforms/util.py:
##########
@@ -1319,6 +1320,274 @@ def expand(self, pcoll):
self._batch_size_estimator, self._element_size_fn))
+def _default_element_size_fn(element: Any) -> int:
+ """Default element size function that tries len(), falls back to 1.
+
+ This function attempts to compute the size of an element using len().
+ If the element does not support len() (e.g., integers), it falls back to 1.
+
+ Args:
+ element: The element to compute the size of.
+
+ Returns:
+ The size of the element, or 1 if len() is not supported.
+ """
+ try:
+ return len(element)
+ except TypeError:
+ return 1
Review Comment:
> We should probably warn in this case since this is almost certainly not
what the user desired. To avoid warning on every element, we can make this
function a member of the `_SortAndBatchElementsDoFn` and use a member variable
to track if we've already warned. That way we only warn once per DoFn instance
Makes sense, will do. I'll move this into _SortAndBatchElementsDoFn as a
method and track with an instance variable so we only warn once per DoFn
instance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]