[
https://issues.apache.org/jira/browse/BEAM-11719?focusedWorklogId=591694&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591694
]
ASF GitHub Bot logged work on BEAM-11719:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 30/Apr/21 19:08
Start Date: 30/Apr/21 19:08
Worklog Time Spent: 10m
Work Description: shoyer commented on pull request #14680:
URL: https://github.com/apache/beam/pull/14680#issuecomment-830314492
I pushed another commit re-raising exceptions with more context for failures
in deterministic encoding.
Example:
```
In [1]: import collections
In [2]: from apache_beam.coders import FastPrimitivesCoder
In [3]: Pair = collections.namedtuple('Pair', ['x', 'y'])
In [4]: coder = FastPrimitivesCoder().as_deterministic_coder('step')
In [5]: coder.encode(Pair(1, {'x': 2}))
```
On **master** this results in:
```
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-e48fdbc23ff7> in <module>
----> 1 coder.encode(Pair(1, {'x': 2}))
...
~/dev/beam/sdks/python/apache_beam/coders/coder_impl.py in
encode_special_deterministic(self, value, stream)
460 "Unable to deterministically encode '%s' of type '%s', "
461 "please provide a type hint for the input of '%s'" %
--> 462 (value, type(value),
self.requires_deterministic_step_label))
463
464 def encode_type(self, t, stream):
TypeError: Unable to deterministically encode '{'x': 2}' of type '<class
'dict'>', please provide a type hint for the input of 'step'
```
With **this commit**:
```
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/dev/beam/sdks/python/apache_beam/coders/coder_impl.py in
encode_special_deterministic(self, value, stream)
456 try:
--> 457 self.iterable_coder_impl.encode_to_stream(value, stream,
True)
458 except Exception as e:
...
~/dev/beam/sdks/python/apache_beam/coders/coder_impl.py in
encode_special_deterministic(self, value, stream)
482 else:
--> 483 raise TypeError(self._deterministic_encoding_error_msg(value))
484
TypeError: Unable to deterministically encode '{'x': 2}' of type '<class
'dict'>', please provide a type hint for the input of 'step'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-12-e48fdbc23ff7> in <module>
----> 1 coder.encode(Pair(1, {'x': 2}))
...
~/dev/beam/sdks/python/apache_beam/coders/coder_impl.py in
encode_special_deterministic(self, value, stream)
457 self.iterable_coder_impl.encode_to_stream(value, stream,
True)
458 except Exception as e:
--> 459 raise
TypeError(self._deterministic_encoding_error_msg(value)) from e
460 elif isinstance(value, enum.Enum):
461 stream.write_byte(ENUM_TYPE)
TypeError: Unable to deterministically encode 'Pair(x=1, y={'x': 2})' of
type '<class '__main__.Pair'>', please provide a type hint for the input of
'step'
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 591694)
Time Spent: 13h (was: 12h 50m)
> Enforce deterministic coding for GroupByKey and Stateful DoFns
> --------------------------------------------------------------
>
> Key: BEAM-11719
> URL: https://issues.apache.org/jira/browse/BEAM-11719
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Robert Bradshaw
> Assignee: Robert Bradshaw
> Priority: P1
> Fix For: 2.29.0
>
> Time Spent: 13h
> Remaining Estimate: 0h
>
> If a non-deterministic coder, such as pickling, is used for keys this can
> result in two copies of the same key being grouped separately (based on their
> encodings).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)