[ 
https://issues.apache.org/jira/browse/BEAM-11719?focusedWorklogId=591694&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-591694
 ]

ASF GitHub Bot logged work on BEAM-11719:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Apr/21 19:08
            Start Date: 30/Apr/21 19:08
    Worklog Time Spent: 10m 
      Work Description: shoyer commented on pull request #14680:
URL: https://github.com/apache/beam/pull/14680#issuecomment-830314492


   I pushed another commit re-raising exceptions with more context for failures 
in deterministic encoding.
   
   Example:
   ```
   In [1]: import collections
   
   In [2]: from apache_beam.coders import FastPrimitivesCoder
   
   In [3]: Pair = collections.namedtuple('Pair', ['x', 'y'])
   
   In [4]: coder = FastPrimitivesCoder().as_deterministic_coder('step')
   
   In [5]: coder.encode(Pair(1, {'x': 2}))
   ```
   
   On **master** this results in:
   ```
   ---------------------------------------------------------------------------
   TypeError                                 Traceback (most recent call last)
   <ipython-input-5-e48fdbc23ff7> in <module>
   ----> 1 coder.encode(Pair(1, {'x': 2}))
   
   ...
   
   ~/dev/beam/sdks/python/apache_beam/coders/coder_impl.py in 
encode_special_deterministic(self, value, stream)
       460           "Unable to deterministically encode '%s' of type '%s', "
       461           "please provide a type hint for the input of '%s'" %
   --> 462           (value, type(value), 
self.requires_deterministic_step_label))
       463
       464   def encode_type(self, t, stream):
   
   TypeError: Unable to deterministically encode '{'x': 2}' of type '<class 
'dict'>', please provide a type hint for the input of 'step'
   ```
   
   With **this commit**:
   ```
   ---------------------------------------------------------------------------
   TypeError                                 Traceback (most recent call last)
   ~/dev/beam/sdks/python/apache_beam/coders/coder_impl.py in 
encode_special_deterministic(self, value, stream)
       456       try:
   --> 457         self.iterable_coder_impl.encode_to_stream(value, stream, 
True)
       458       except Exception as e:
   
   ...
   
   ~/dev/beam/sdks/python/apache_beam/coders/coder_impl.py in 
encode_special_deterministic(self, value, stream)
       482     else:
   --> 483       raise TypeError(self._deterministic_encoding_error_msg(value))
       484
   
   TypeError: Unable to deterministically encode '{'x': 2}' of type '<class 
'dict'>', please provide a type hint for the input of 'step'
   
   The above exception was the direct cause of the following exception:
   
   TypeError                                 Traceback (most recent call last)
   <ipython-input-12-e48fdbc23ff7> in <module>
   ----> 1 coder.encode(Pair(1, {'x': 2}))
   
   ...
   
   ~/dev/beam/sdks/python/apache_beam/coders/coder_impl.py in 
encode_special_deterministic(self, value, stream)
       457         self.iterable_coder_impl.encode_to_stream(value, stream, 
True)
       458       except Exception as e:
   --> 459         raise 
TypeError(self._deterministic_encoding_error_msg(value)) from e
       460     elif isinstance(value, enum.Enum):
       461       stream.write_byte(ENUM_TYPE)
   
   TypeError: Unable to deterministically encode 'Pair(x=1, y={'x': 2})' of 
type '<class '__main__.Pair'>', please provide a type hint for the input of 
'step'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 591694)
    Time Spent: 13h  (was: 12h 50m)

> Enforce deterministic coding for GroupByKey and Stateful DoFns
> --------------------------------------------------------------
>
>                 Key: BEAM-11719
>                 URL: https://issues.apache.org/jira/browse/BEAM-11719
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Robert Bradshaw
>            Assignee: Robert Bradshaw
>            Priority: P1
>             Fix For: 2.29.0
>
>          Time Spent: 13h
>  Remaining Estimate: 0h
>
> If a non-deterministic coder, such as pickling, is used for keys this can 
> result in two copies of the same key being grouped separately (based on their 
> encodings). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to