bzablocki commented on code in PR #27284:
URL: https://github.com/apache/beam/pull/27284#discussion_r1300123614
##########
sdks/python/apache_beam/yaml/yaml_provider.py:
##########
@@ -415,6 +415,10 @@ def _parse_window_spec(spec):
'Flatten': Flatten,
'WindowInto': WindowInto,
'GroupByKey': beam.GroupByKey,
+ 'SumPerKey': lambda: beam.CombinePerKey(sum),
Review Comment:
Two things:
1) I wanted to implement the CombineGlobally transform that would take a
TopN combiner, as you suggested. For some reason I can't get it to work...
That's my code for defining the transform in the dictionary in the
yaml_provider.py:
```python
'CombineGlobally': lambda fn: beam.CombineGlobally(
fn=python_callable.PythonCallableWithSource(fn)),
```
Then in the yaml pipeline I would use it like this:
```
- type: CombineGlobally
name: MostPopular
input: XXX
fn: "beam.combiners.TopCombineFn(1)"
```
but that results in error:
```
self = <apache_beam.utils.python_callable.PythonCallableWithSource object at
0x7f9802d93b20>
args = ([3, 4],), kwargs = {}
def __call__(self, *args, **kwargs):
> return self._callable(*args, **kwargs)
E TypeError: 'TopCombineFn' object is not callable
```
Do you have a suggestion how to implement it otherwise?
2) I still think that we should have high level transforms that are popular,
like 'TopN', 'Largest', 'Smallest', etc available in the global scope. If the
yaml api is aimed at less experienced developer, they will have a hard time
creating these transforms with callable arguments.
##########
sdks/python/apache_beam/yaml/yaml_provider.py:
##########
@@ -415,6 +415,10 @@ def _parse_window_spec(spec):
'Flatten': Flatten,
'WindowInto': WindowInto,
'GroupByKey': beam.GroupByKey,
+ 'SumPerKey': lambda: beam.CombinePerKey(sum),
Review Comment:
Two things:
1) I wanted to implement the CombineGlobally transform that would take a
TopN combiner, as you suggested. For some reason I can't get it to work...
That's my code for defining the transform in the dictionary in the
yaml_provider.py:
```python
'CombineGlobally': lambda fn: beam.CombineGlobally(
fn=python_callable.PythonCallableWithSource(fn)),
```
Then in the yaml pipeline I would use it like this:
```
- type: CombineGlobally
name: MostPopular
input: XXX
fn: "beam.combiners.TopCombineFn(1)"
```
but that results in error:
```
self = <apache_beam.utils.python_callable.PythonCallableWithSource object at
0x7f9802d93b20>
args = ([3, 4],), kwargs = {}
def __call__(self, *args, **kwargs):
> return self._callable(*args, **kwargs)
E TypeError: 'TopCombineFn' object is not callable
```
Do you have a suggestion how to implement it otherwise?
2) I still think that we should have high level transforms that are popular,
like 'TopN', 'Largest', 'Smallest', etc available in the global scope. If the
yaml api is aimed at less experienced developer, they will have a hard time
creating these transforms with callable arguments.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]