cyb70289 commented on pull request #12442: URL: https://github.com/apache/arrow/pull/12442#issuecomment-1080476330
Met with some issues when running flight ucx tranport tests, most tests fail on my machine. I didn't debugged yet. Will be great if @lidavidm can give some quick comments. Below log is from `UcxDataTest.TestDoGetInts`, other tests logs are similar. Full debug logs is attached if useful: [ucx-debug-log.txt](https://github.com/apache/arrow/files/8361815/ucx-debug-log.txt) ``` cyb@my-test-host:~/arrow/cpp/debug$ UCX_LOG_LEVEL=INFO debug/arrow-flight-transport-ucx-test --gtest_filter="UcxDataTest.TestDoGetInts" Running main() from /home/cyb/arrow/cpp/debug/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc Note: Google Test filter = UcxDataTest.TestDoGetInts [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from UcxDataTest [ RUN ] UcxDataTest.TestDoGetInts [1648461566.449001] [my-test-host:4147425:0] ucp_context.c:1778 UCX INFO UCP version is 1.12 (release 0) [1648461566.472887] [my-test-host:4147425:0] ucp_worker.c:2118 UCX DIAG multi-threaded worker is requested, but library is built without multi-thread support [1648461566.514306] [my-test-host:4147425:0] parser.c:1914 UCX INFO UCX_* env variable: UCX_LOG_LEVEL=INFO [1648461566.514950] [my-test-host:4147425:0] ucp_context.c:1778 UCX INFO UCP version is 1.12 (release 0) [1648461566.581439] [my-test-host:4147425:0] sock.c:660 UCX ERROR unknown address family: 0 [1648461566.585326] [my-test-host:4147425:a] rdmacm_cm.c:351 UCX DIAG [cep 0xaaab1a74ddf0 ::1:49784->::1:55919 client Success]: rdma_resolve_route failed: No such device [1648461566.585361] [my-test-host:4147425:a] uct_cm.c:99 UCX DIAG resolve callback failed with error: Destination is unreachable [1648461566.629139] [my-test-host:4147425:1] ucp_worker.c:1866 UCX INFO ep_cfg[0]: am(tcp/lo); [1648461566.629540] [my-test-host:4147425:1] sock.c:325 UCX ERROR connect(fd=85, dest_addr=::1:47937) failed: Connection refused [1648461566.629552] [my-test-host:4147425:1] wireup_cm.c:1203 UCX WARN server ep 0xffff9e27b000 failed to connect to remote address on device lo, tl_bitmap 0x40 0x0, status Destination is unreachable [1648461566.629619] [my-test-host:4147425:0] ucp_ep.c:1222 UCX DIAG ep 0xffff9f7e6000: error 'Operation rejected by remote peer' on CM lane will not be handled since no error callback is installed ../src/arrow/flight/transport/ucx/ucx_server.cc:490: [server][peer=a00:8c0a:::35850] Failed to create endpoint: IOError: ucp_ep_create: UCX error -6: UCS_ERR_UNREACHABLE Destination is unreachable ../src/arrow/flight/test_definitions.cc:142: Failure Failed 'client_->GetFlightInfo(descr, &info)' failed with IOError: ucp_request_check_status: UCX error -23: UCS_ERR_REJECTED Operation rejected by remote peer [1648461566.630037] [my-test-host:4147425:0] flush.c:26 UCX ERROR req 0xaaab1a79e380: error during flush: Endpoint timeout, flush comp 0xaaab1a79e420 count reduced to 1 [1648461566.630047] [my-test-host:4147425:0] flush.c:26 UCX ERROR req 0xaaab1a79e380: error during flush: Endpoint timeout, flush comp 0xaaab1a79e420 count reduced to 0 free(): invalid pointer Aborted (core dumped) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
