[
https://issues.apache.org/jira/browse/BEAM-12357?focusedWorklogId=609431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609431
]
ASF GitHub Bot logged work on BEAM-12357:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 10/Jun/21 00:34
Start Date: 10/Jun/21 00:34
Worklog Time Spent: 10m
Work Description: heidimhurst commented on a change in pull request
#14869:
URL: https://github.com/apache/beam/pull/14869#discussion_r648770142
##########
File path: sdks/python/apache_beam/transforms/util.py
##########
@@ -730,14 +731,30 @@ def from_runner_api_parameter(
return Reshuffle()
+def fn_takes_side_inputs(fn):
+ try:
+ signature = get_signature(fn)
+ except TypeError:
+ # We can't tell; maybe it does.
+ return True
+
+ return (
+ len(signature.parameters) > 1 or any(
+ p.kind == p.VAR_POSITIONAL or p.kind == p.VAR_KEYWORD
+ for p in signature.parameters.values()))
+
+
@ptransform_fn
-def WithKeys(pcoll, k):
+def WithKeys(pcoll, k, *args, **kwargs):
"""PTransform that takes a PCollection, and either a constant key or a
callable, and returns a PCollection of (K, V), where each of the values in
the input PCollection has been paired with either the constant key or a key
- computed from the value.
+ computed from the value. The callable may optionally accept positional or
+ keyword arguments, which should be passed to WithKeys directly.
"""
if callable(k):
+ if fn_takes_side_inputs(k):
+ return pcoll | Map(lambda v: (k(v, *args, **kwargs), v))
Review comment:
I was too hasty to incorporate this feedback @pabloem. Since the lambda
function does not have `*args` and `**kwargs` in the function signature, there
is no need (and no way) to pass them in directly. Instead, the lambda function
defines a wrapper which feeds `*args` and `**kwargs` directly into `k`. At
least this is how I understand what's happening here - does that make sense?
Tests were passing before (and this is the fix I'm currently using in
production with my team) so I think this change was unnecessary.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 609431)
Time Spent: 2h 50m (was: 2h 40m)
> WithKeys transform to take args, kwargs
> ---------------------------------------
>
> Key: BEAM-12357
> URL: https://issues.apache.org/jira/browse/BEAM-12357
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Heidi Hurst
> Assignee: Heidi Hurst
> Priority: P3
> Labels: newbie
> Original Estimate: 1h
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> The current WithKeys implementation for Python precludes the use of functions
> which require additional inputs. An updated version would exhibit the same
> behavior as `beam.Map`, which can receive inputs passed in as
> `beam.Map(some_fn, foo=foo, bar=bar)`.
> (Builds off of https://issues.apache.org/jira/browse/BEAM-7023).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)