cyb70289 commented on pull request #12442:
URL: https://github.com/apache/arrow/pull/12442#issuecomment-1080476330


   Met with some issues when running flight ucx tranport tests, most tests fail 
on my machine.
   I didn't debugged yet. Will be great if @lidavidm can give some quick 
comments.
   Below log is from `UcxDataTest.TestDoGetInts`, other tests logs are similar.
   Full debug logs is attached if useful: 
[ucx-debug-log.txt](https://github.com/apache/arrow/files/8361815/ucx-debug-log.txt)
   
   ```
   cyb@my-test-host:~/arrow/cpp/debug$ UCX_LOG_LEVEL=INFO 
debug/arrow-flight-transport-ucx-test --gtest_filter="UcxDataTest.TestDoGetInts"
   Running main() from 
/home/cyb/arrow/cpp/debug/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc
   Note: Google Test filter = UcxDataTest.TestDoGetInts
   [==========] Running 1 test from 1 test suite.
   [----------] Global test environment set-up.
   [----------] 1 test from UcxDataTest
   [ RUN      ] UcxDataTest.TestDoGetInts
   [1648461566.449001] [my-test-host:4147425:0]     ucp_context.c:1778 UCX  
INFO  UCP version is 1.12 (release 0)
   [1648461566.472887] [my-test-host:4147425:0]      ucp_worker.c:2118 UCX  
DIAG  multi-threaded worker is requested, but library is built without 
multi-thread support
   [1648461566.514306] [my-test-host:4147425:0]          parser.c:1914 UCX  
INFO  UCX_* env variable: UCX_LOG_LEVEL=INFO
   [1648461566.514950] [my-test-host:4147425:0]     ucp_context.c:1778 UCX  
INFO  UCP version is 1.12 (release 0)
   [1648461566.581439] [my-test-host:4147425:0]            sock.c:660  UCX  
ERROR unknown address family: 0
   [1648461566.585326] [my-test-host:4147425:a]       rdmacm_cm.c:351  UCX  
DIAG  [cep 0xaaab1a74ddf0 ::1:49784->::1:55919 client Success]: 
rdma_resolve_route failed: No such device
   [1648461566.585361] [my-test-host:4147425:a]          uct_cm.c:99   UCX  
DIAG  resolve callback failed with error: Destination is unreachable
   [1648461566.629139] [my-test-host:4147425:1]      ucp_worker.c:1866 UCX  
INFO    ep_cfg[0]: am(tcp/lo);
   [1648461566.629540] [my-test-host:4147425:1]            sock.c:325  UCX  
ERROR   connect(fd=85, dest_addr=::1:47937) failed: Connection refused
   [1648461566.629552] [my-test-host:4147425:1]       wireup_cm.c:1203 UCX  
WARN  server ep 0xffff9e27b000 failed to connect to remote address on device 
lo, tl_bitmap 0x40 0x0, status Destination is unreachable
   [1648461566.629619] [my-test-host:4147425:0]          ucp_ep.c:1222 UCX  
DIAG  ep 0xffff9f7e6000: error 'Operation rejected by remote peer' on CM lane 
will not be handled since no error callback is installed
   ../src/arrow/flight/transport/ucx/ucx_server.cc:490: 
[server][peer=a00:8c0a:::35850] Failed to create endpoint: IOError: 
ucp_ep_create: UCX error -6: UCS_ERR_UNREACHABLE Destination is unreachable
   ../src/arrow/flight/test_definitions.cc:142: Failure
   Failed
   'client_->GetFlightInfo(descr, &info)' failed with IOError: 
ucp_request_check_status: UCX error -23: UCS_ERR_REJECTED Operation rejected by 
remote peer
   [1648461566.630037] [my-test-host:4147425:0]           flush.c:26   UCX  
ERROR req 0xaaab1a79e380: error during flush: Endpoint timeout, flush comp 
0xaaab1a79e420 count reduced to 1
   [1648461566.630047] [my-test-host:4147425:0]           flush.c:26   UCX  
ERROR req 0xaaab1a79e380: error during flush: Endpoint timeout, flush comp 
0xaaab1a79e420 count reduced to 0
   free(): invalid pointer
   Aborted (core dumped)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to