Chengxin Ma created ARROW-7200: ---------------------------------- Summary: Running Arrow Flight benchmark on two hosts doesn't work Key: ARROW-7200 URL: https://issues.apache.org/jira/browse/ARROW-7200 Project: Apache Arrow Issue Type: Bug Components: Benchmarking, C++, FlightRPC Affects Versions: 0.15.1, 0.15.0 Environment: AWS EC2 Instance type: t3a.xlarge AMI: ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20191002 Number of instances: 2 They are capable of pinging each other. Reporter: Chengxin Ma Attachments: Screen Shot 2019-11-18 at 16.00.38.png
I was trying to evaluate the performance of Apache Arrow Flight on two hosts (one as the client and the other one as the server), using [the official benchmark|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flight_benchmark.cc]]. Flags I used to build the project were: {code:java} -DARROW_FLIGHT=ON -DCMAKE_BUILD_TYPE=Debug -DARROW_BUILD_BENCHMARKS=ON {code} The branch I used was maint-0.15.x since there was a build error on the master branch. _(The build error on master only existed in the environment where I set up two hosts: AWS. On my local environment (macOS) the build was successful on the master branch. I don't think this build error is relevant to the issue since there is no difference in the cpp source code.)_ On the host acting as the server, I ran {code:java} ./arrow-flight-perf-server{code} On the host acting as the client, I ran {code:java} ./arrow-flight-benchmark --server_host ip-172-31-11-18{code} It gives the following error: {code:java} Failed with error: << IOError: gRPC returned unavailable error, with message: Connect Failed. Detail: Unavailable{code} If I ran {code:java} ./arrow-flight-benchmark --server_host ip-172-31-11-17{code} the error will be different: {code:java} IOError: Server was not available after 10 attempts{code} This is understandable since this host doesn't exist at all. This indicates that Flight is able to find the existing host (ip-172-31-11-18), but the communication somehow didn't succeed. The benchmark works fine if I run it with the localhost, either by not specifying the server_host flag or running the server in another process on the same host. I am not sure if the problem is in the environment or in the code itself. Could someone please give me some hint on how to resolve the problem? -- This message was sent by Atlassian Jira (v8.3.4#803005)