gemini-code-assist[bot] commented on code in PR #38236: URL: https://github.com/apache/beam/pull/38236#discussion_r3102724500
########## sdks/python/apache_beam/yaml/yaml_mapping.py: ########## @@ -16,8 +16,13 @@ # """This module defines the basic MapToFields operation.""" + +import datetime import itertools +import json Review Comment:  The `json` module is imported but does not appear to be used anywhere in the modified code. If it's not required for other parts of the file not shown in this diff, it should be removed to keep the imports clean. ########## sdks/python/apache_beam/yaml/yaml_mapping.py: ########## @@ -205,83 +212,71 @@ def py_value_to_js_dict(py_value): return py_value +def js_to_py(obj): + """Converts mini-racer mapped objects to standard Python types. + + This is needed because ctx.eval returns objects that implement Mapping + and Iterable but are not picklable (like JSMappedObjectImpl and JSArrayImpl), + which would fail when Beam tries to serialize rows containing them. + We also preserve datetime objects which are correctly produced by ctx.eval + for JS Date objects. + """ + if isinstance(obj, datetime.datetime): + return obj + elif isinstance(obj, Mapping): + return {k: js_to_py(v) for k, v in obj.items()} + elif not isinstance(obj, str) and isinstance(obj, Iterable): Review Comment:  The check `isinstance(obj, Iterable)` will return `True` for `bytes` objects, causing `js_to_py` to convert a `bytes` object into a list of integers. If the intention is to preserve binary data (e.g., from a JS `ArrayBuffer`), you should exclude `bytes` from this recursion, similar to how `str` is handled. ```suggestion elif not isinstance(obj, (str, bytes)) and isinstance(obj, Iterable): ``` ########## sdks/python/apache_beam/yaml/yaml_udf_test.py: ########## @@ -374,6 +374,33 @@ def g(x): row=beam.Row(rank=2, values=[7, 8, 9])), ])) + @unittest.skipIf(MiniRacer is None, 'py_mini_racer not installed.') + def test_map_to_fields_js_date(self): + import datetime + with beam.Pipeline(options=beam.options.pipeline_options.PipelineOptions( + pickle_library='cloudpickle', yaml_experimental_features=['javascript' + ])) as p: + elements = p | beam.Create([beam.Row(label='11a')]) + result = elements | YamlTransform( + ''' + type: MapToFields + config: + language: javascript + fields: + date: + callable: | + function get_date(x) { + return new Date('2026-04-17T18:00:00Z') + } + ''') + + expected_date = '2026-04-17T18:00:00.000Z' + + assert_that( + result | as_rows(), equal_to([ + beam.Row(date=expected_date), Review Comment:  The test expects the `date` field to be a string (`'2026-04-17T18:00:00.000Z'`), but the `js_to_py` function in `yaml_mapping.py` is explicitly designed to preserve `datetime.datetime` objects returned by `MiniRacer`. This will likely cause an assertion failure because a `datetime` object is not equal to its string representation. You should use a `datetime` object in the expected row. ```suggestion expected_date = datetime.datetime(2026, 4, 17, 18, tzinfo=datetime.timezone.utc) assert_that( result | as_rows(), equal_to([ beam.Row(date=expected_date), ``` ########## sdks/python/apache_beam/yaml/yaml_mapping.py: ########## @@ -16,8 +16,13 @@ # """This module defines the basic MapToFields operation.""" + +import datetime import itertools +import json import re +import threading +import uuid from collections import abc from collections.abc import Callable from collections.abc import Collection Review Comment:  The types `Mapping` and `Iterable` are used in the `js_to_py` function (lines 226 and 228) but have not been imported into the module's namespace. This will cause a `NameError` at runtime when a JavaScript mapping function is executed. Please add them to the `collections.abc` imports. ```suggestion from collections.abc import Callable from collections.abc import Collection from collections.abc import Iterable from collections.abc import Mapping ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
