[
https://issues.apache.org/jira/browse/BEAM-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379850#comment-16379850
]
Luke Cwik commented on BEAM-3757:
---------------------------------
It is best to reach out to
[[email protected]|mailto:to%c2%[email protected]]
for support with a job id and a link to this bug.
> Shuffle read failed using python 2.2.0
> --------------------------------------
>
> Key: BEAM-3757
> URL: https://issues.apache.org/jira/browse/BEAM-3757
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Affects Versions: 2.2.0
> Environment: gcp, macos
> Reporter: Jonathan Delfour
> Assignee: Thomas Groh
> Priority: Major
>
> Hi,
> First issue is that the beam 2.3.0 python SDK is apparently not working on
> GCP. It gets stuck:
> {noformat}
> Workflow failed. Causes: (bf832d44290fbf41): The Dataflow appears to be
> stuck. You can get help with Cloud Dataflow at
> https://cloud.google.com/dataflow/support.
> {noformat}
> I tried two times.
> Reverting back to 2.2.0: it usually works but today, after > 1 hour of
> processing, and 30 workers used, I get a failure with these in the logs:
> {noformat}
> Traceback (most recent call last):
> File
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line
> 582, in do_work
> work_executor.execute()
> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py",
> line 167, in execute
> op.start()
> File "dataflow_worker/shuffle_operations.py", line 49, in
> dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
> def start(self):
> File "dataflow_worker/shuffle_operations.py", line 50, in
> dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
> with self.scoped_start_state:
> File "dataflow_worker/shuffle_operations.py", line 65, in
> dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
> with self.shuffle_source.reader() as reader:
> File "dataflow_worker/shuffle_operations.py", line 67, in
> dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
> for key_values in reader:
> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py",
> line 406, in __iter__
> for entry in entries_iterator:
> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py",
> line 248, in next
> return next(self.iterator)
> File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py",
> line 206, in __iter__
> chunk, next_position = self.reader.Read(start_position, end_position)
> File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 138, in
> shuffle_client.PyShuffleReader.Read
> IOError: Shuffle read failed: INTERNAL: Received RST_STREAM with error code 2
> talking to my-dataflow-02271107-756f-harness-2p65:12346
> {noformat}
> i also get some information message:
> {noformat}
> Refusing to split <dataflow_worker.shuffle.GroupedShuffleRangeTracker object
> at 0x7f03a00fe790> at '\x00\x00\x00\t\x1d\x14\x87\xa3\x00\x01': unstarted
> {noformat}
> For the flow, I am extracting data from BQ, cleaning using pandas, exporting
> as a csv file, gzipping and uploading the compressed file to a bucket using
> decompressive transcoding (csv export, gzip compression and upload are in the
> same 'worker' as they are done in the same beam.DoFn).
> PS: i can't find a reasonable way to export the logs from GCP but i can
> privately send the log file i have of the run on my machine (the log of the
> pipeline)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)