Enis Nazif created BEAM-9249:
--------------------------------
Summary: Python SDK DirectRunner with multiprocessing fails with
grpc.RPCError@ 'Received message larger than max'
Key: BEAM-9249
URL: https://issues.apache.org/jira/browse/BEAM-9249
Project: Beam
Issue Type: Bug
Components: runner-direct
Affects Versions: 2.19.0
Reporter: Enis Nazif
When running a pipeline using the Python SDK, DirectRunner, and a
direct_running_mode of 'multi_processing' or 'multi_threading', the job fails
with
Reproducible example:
{code:java}
import apache_beam as beam
p =
beam.Pipeline(options=PipelineOptions(direct_running_mode='multi_processing',
num_direct_runners=4)) | beam.Create([i for i in range(1000000)])
p.pipeline.run(){code}
Output:
{code:java}
Traceback (most recent call last):Traceback (most recent call last): File
"/opt/man/releases/python-medusa/36-1/lib/python3.6/runpy.py", line 193, in
_run_module_as_main "__main__", mod_spec) File
"/opt/man/releases/python-medusa/36-1/lib/python3.6/runpy.py", line 85, in
_run_code exec(code, run_globals) File
"/users/is/enazif/beam/sdks/python/apache_beam/runners/worker/sdk_worker_main.py",
line 249, in <module> main(sys.argv) File
"/users/is/enazif/beam/sdks/python/apache_beam/runners/worker/sdk_worker_main.py",
line 160, in main sdk_pipeline_options.view_as(ProfilingOptions)) File
"/users/is/enazif/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
line 137, in run for work_request in control_stub.Control(get_responses()):
File
"/users/is/enazif/pyenvs/04-02-2020-1/lib/python3.6/site-packages/grpcio-1.18.0+ahl1-py3.6-linux-x86_64.egg/grpc/_channel.py",
line 364, in __next__ return self._next() File
"/users/is/enazif/pyenvs/04-02-2020-1/lib/python3.6/site-packages/grpcio-1.18.0+ahl1-py3.6-linux-x86_64.egg/grpc/_channel.py",
line 358, in _next raise selfgrpc._channel._Rendezvous: <_Rendezvous of RPC
that terminated with: status = StatusCode.RESOURCE_EXHAUSTED details =
"Received message larger than max (4806980 vs. 4194304)" debug_error_string =
"{"created":"@1580828874.271749579","description":"Received message larger than
max (4806980 vs.
4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":174,"grpc_status":8}">Exception
in thread run_worker:Traceback (most recent call last): File
"/opt/man/releases/python-medusa/36-1/lib/python3.6/threading.py", line 916, in
_bootstrap_inner self.run() File
"/opt/man/releases/python-medusa/36-1/lib/python3.6/threading.py", line 864, in
run self._target(*self._args, **self._kwargs) File
"/users/is/enazif/beam/sdks/python/apache_beam/runners/portability/local_job_service.py",
line 218, in run 'Worker subprocess exited with return code %s' %
p.returncode)RuntimeError: Worker subprocess exited with return code 1
{code}
This seems to be due to the fact that we're setting
grpc.max_receive_message_length and grpc.max_send_message_length on the GRPC
server side, but not on the channel side
([https://github.com/apache/beam/blob/7547ac6b273e6e2ffe7d69775606e14c0fd455b2/sdks/python/apache_beam/runners/worker/channel_factory.py#L28)|https://github.com/apache/beam/blob/7547ac6b273e6e2ffe7d69775606e14c0fd455b2/sdks/python/apache_beam/runners/worker/channel_factory.py#L28]
Adding grpc.max_receive_message_length and grpc.max_send_message_length to the
default options here fixes the issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)