damccorm opened a new pull request, #36495:
URL: https://github.com/apache/beam/pull/36495

   This drops the determinism requirement for GBEK coders from an error to a 
warning. This matches what GBK does today, which is important because users 
should be able to just drop in a `--gbek` pipeline option and have things just 
work.
   
   Today, some of our built-in beam transforms fail with this left in. For 
example, without this change, 
[testDataframeSum](https://github.com/apache/beam/blob/d54a661f47e87c894f84a7cf63fac03bae6f3ec3/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/DataframeTransformTest.java#L37)
 fails with:
   
   ```
   java.lang.RuntimeException: Traceback (most recent call last):
     File "apache_beam/coders/coder_impl.py", line 540, in 
apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.encode_special_deterministic
     File "apache_beam/coders/coder_impl.py", line 460, in 
apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.encode_to_stream
     File "apache_beam/coders/coder_impl.py", line 481, in 
apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.encode_to_stream
     File "apache_beam/coders/coder_impl.py", line 544, in 
apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.encode_special_deterministic
   TypeError: Unable to deterministically encode 'BlockManager
   Items: Index(['b'], dtype='object')
   Axis 1: Index([100], dtype='int64', name='a')
   NumpyBlock: slice(0, 1, 1), 1 x 1, dtype: int32' of type '<class 
'pandas.core.internals.managers.BlockManager'>', please provide a type hint for 
the input of 'GroupByEncryptedKey Group by encrypted keyThe key coder is not 
deterministic. This may result in incorrect pipeline output. This can be fixed 
by adding a type hint to the operation preceding the GroupByKey step, and for 
custom key classes, by writing a deterministic custom Coder. Please see the 
documentation for more details.'
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "apache_beam/runners/common.py", line 1498, in 
apache_beam.runners.common.DoFnRunner.process
     File "apache_beam/runners/common.py", line 684, in 
apache_beam.runners.common.SimpleInvoker.invoke_process
     File "apache_beam/runners/common.py", line 1673, in 
apache_beam.runners.common._OutputHandler.handle_process_outputs
     File 
"/usr/local/lib/python3.13/site-packages/apache_beam/transforms/util.py", line 
444, in process
       encoded_value = self.value_coder.encode(v)
     File 
"/usr/local/lib/python3.13/site-packages/apache_beam/coders/coders.py", line 
459, in encode
       return self.get_impl().encode(value)
              ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
     File "apache_beam/coders/coder_impl.py", line 237, in 
apache_beam.coders.coder_impl.StreamCoderImpl.encode
     File "apache_beam/coders/coder_impl.py", line 240, in 
apache_beam.coders.coder_impl.StreamCoderImpl.encode
     File "apache_beam/coders/coder_impl.py", line 1120, in 
apache_beam.coders.coder_impl.AbstractComponentCoderImpl.encode_to_stream
     File "apache_beam/coders/coder_impl.py", line 481, in 
apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.encode_to_stream
     File "apache_beam/coders/coder_impl.py", line 542, in 
apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.encode_special_deterministic
   TypeError: Unable to deterministically encode '     b
   a     
   100  3' of type '<class 'pandas.core.frame.DataFrame'>', please provide a 
type hint for the input of 'GroupByEncryptedKey Group by encrypted keyThe key 
coder is not deterministic. This may result in incorrect pipeline output. This 
can be fixed by adding a type hint to the operation preceding the GroupByKey 
step, and for custom key classes, by writing a deterministic custom Coder. 
Please see the documentation for more details.'
   
   During handling of the above exception, another exception occurred:
   ```
   
   I'd assume other dataframe tests fail similarly.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   [![Build python source distribution and 
wheels](https://github.com/apache/beam/actions/workflows/build_wheels.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/actions/workflows/python_tests.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/actions/workflows/java_tests.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/actions/workflows/go_tests.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to