gemini-code-assist[bot] commented on code in PR #35753:
URL: https://github.com/apache/beam/pull/35753#discussion_r2245972899
##########
sdks/python/apache_beam/yaml/yaml_ml.py:
##########
@@ -32,37 +29,14 @@
from apache_beam.yaml import options
from apache_beam.yaml.yaml_utils import SafeLineLoader
-
-def _list_submodules(package):
- """
- Lists all submodules within a given package.
- """
- submodules = []
- for _, module_name, _ in pkgutil.walk_packages(
- package.__path__, package.__name__ + '.'):
- if 'test' in module_name:
- continue
- submodules.append(module_name)
- return submodules
-
-
try:
from apache_beam.ml.transforms import tft
from apache_beam.ml.transforms.base import MLTransform
# TODO(robertwb): Is this all of them?
- _transform_constructors = {}
+ _transform_constructors = tft.__dict__
Review Comment:

Using `tft.__dict__` to populate `_transform_constructors` is risky as it
includes not only the transform classes but also everything else in the `tft`
module's namespace (e.g., imported modules, constants, base classes). This
could lead to a `TypeError` at runtime if a YAML file specifies a `type` that
corresponds to a non-callable member from the dictionary.
A safer approach would be to build the dictionary of constructors from
`tft.__all__`, which is the idiomatic way to define a module's public API.
```python
_transform_constructors = {name: getattr(tft, name) for name in tft.__all__
if hasattr(tft, name)}
```
##########
sdks/python/apache_beam/ml/transforms/base.py:
##########
@@ -216,31 +212,21 @@
# Get all columns for this item
for col in columns:
- if isinstance(item, dict):
- result.append(item[col])
+ result.append(item[col])
return result
def _dict_output_fn(
columns: Sequence[str],
- batch: Sequence[Union[Dict[str, Any], beam.Row]],
- embeddings: Sequence[Any]) -> list[Union[dict[str, Any], beam.Row]]:
+ batch: Sequence[Dict[str, Any]],
+ embeddings: Sequence[Any]) -> List[Dict[str, Any]]:
Review Comment:

Consider adding a type hint for the return value of `_dict_output_fn` for
better code clarity and maintainability.
```python
def _dict_output_fn(
columns: Sequence[str],
batch: Sequence[Dict[str, Any]],
embeddings: Sequence[Any]) -> List[Dict[str, Any]]:
```
##########
sdks/python/apache_beam/ml/transforms/base.py:
##########
@@ -183,13 +183,9 @@ def append_transform(self, transform: BaseOperation):
"""
-def _dict_input_fn(
- columns: Sequence[str], batch: Sequence[Union[Dict[str, Any],
- beam.Row]]) -> List[str]:
+def _dict_input_fn(columns: Sequence[str],
+ batch: Sequence[Dict[str, Any]]) -> List[str]:
Review Comment:

Consider adding a type hint for the return value of `_dict_input_fn` for
better code clarity and maintainability.
```python
def _dict_input_fn(columns: Sequence[str],
batch: Sequence[Dict[str, Any]]) -> List[str]:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]