Hello Amos,

Yes, I used *only* the control.parallel to launch both the server and
client in parallel.

Is that the correct approach? Is there a better approach?

Thanks for the reply.
Bhupesh
On 06/24/2014 09:00 PM, Amos Kong wrote:
> On Tue, Jun 24, 2014 at 07:36:49PM -0400, Bhupesh Purandare wrote:
>> Hello,
>> I am trying to run netperf2 (parallel) in autotest 0.15.1. (Similar results 
>> in
>> autotest 0.16.0)
> What do you mean by "parallel" here?
>
> There is a single control file (control.parallel) to lunch both client
> and server & control the tests. Did you _only_ use it?
>
>> But the test keeps failing because the barrier slave keeps sending a Bad
>> connection request to master.
>> I am trying to setup machine 172.25.43.234 as client and 172.25.43.226 as the
>> server in the netperf2(parallel) control file.
>>
>> The client and server jobs pass on the server machine (172.25.43.226), but 
>> fail
>> on the client machine (172.25.43.234) with an exception:
>>
>> BarrierError: timeout waiting for barrier: start_1
>>
>> I am trying to understand why the job fails on the client machine and not on
>> the server machine.
>>
>> Here are the relevant lines from /results/19-debug_user/172.25.43.234/debug/
>> client.0.DEBUG   (entire log directory is attached)
>>
>> 06/24 22:51:01 DEBUG|base_utils:0077| Running 
>> '/ghostcache/autotest/tmp/netperf2/src/src/netserver&'
>> 06/24 22:51:01 DEBUG|base_utils:0114| [stdout] Starting netserver with host 
>> 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC
>> 06/24 22:51:01 INFO |base_barri:0163| tag=start_1 port=11922 timeout=600 
>> hostid=172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0497| masterid: 172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0503| members: 172.25.43.234#netperf-client
>> 06/24 22:51:01 INFO |base_barri:0510| selected as master: self._hostid = 
>> 172.25.43.226#netperf-server , self._masterid = 172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 599
>> 06/24 22:51:01 INFO |base_barri:0163| tag=start_1 port=11922 timeout=600 
>> hostid=172.25.43.234#netperf-client
>> 06/24 22:51:01 INFO |base_barri:0497| masterid: 172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0503| members: 172.25.43.234#netperf-client
>> 06/24 22:51:01 INFO |base_barri:0513| selected as slave
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 599
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 599
>> 06/24 22:51:01 INFO |base_barri:0406| calling master
>> 06/24 22:51:11 INFO |base_barri:0189| seconds remaining: 589
>> 06/24 22:51:11 INFO |base_barri:0189| seconds remaining: 589
>> 06/24 22:51:11 INFO |base_barri:0406| calling master
>> 06/24 22:51:11 INFO |base_barri:0263| master said: !tag
>> 06/24 22:51:11 WARNI|base_barri:0267| Bad connection request to master
>> 06/24 22:51:11 INFO |base_barri:0189| seconds remaining: 589
>> 06/24 22:51:11 INFO |base_barri:0189| seconds remaining: 589
>> 06/24 22:51:11 INFO |base_barri:0406| calling master
>> 06/24 22:51:11 INFO |base_barri:0263| master said: !tag
>> ....
>> ....
>> ....
>> 06/24 22:51:21 WARNI|base_barri:0267| Bad connection request to master
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 579
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 579
>> 06/24 22:51:21 INFO |base_barri:0406| calling master
>> 06/24 22:51:21 ERROR|  parallel:0026| child process failed
>> 06/24 22:51:21 DEBUG|  parallel:0030| Traceback (most recent call last):
>> 06/24 22:51:21 DEBUG|  parallel:0030|   File 
>> "/ghostcache/autotest/parallel.py", line 18, in fork_start
>> 06/24 22:51:21 DEBUG|  parallel:0030|     l()
>> 06/24 22:51:21 DEBUG|  parallel:0030|   File "/ghostcache/autotest/job.py", 
>> line 529, in <lambda>
>> ....
>> ....
>> 06/24 22:51:21 DEBUG|  parallel:0030|     self._slave_hello(connection)
>> 06/24 22:51:21 DEBUG|  parallel:0030|   File 
>> "/ghostcache/autotest/shared/base_barrier.py", line 261, in _slave_hello
>> 06/24 22:51:21 DEBUG|  parallel:0030|     reply = client.recv(4)
>> 06/24 22:51:21 DEBUG|  parallel:0030| error: [Errno 104] Connection reset by 
>> peer
>> 06/24 22:51:21 DEBUG|  parallel:0030|
>> 06/24 22:51:21 INFO |       job:0212|           FAIL    netperf2.client 
>> netperf2.client timestamp=1403650281    localtime=Jun 24 22:51:21       
>> Unhandled error: [Errno 104] Connection reset by peer
>> ...
>> ...
>> ...
>> 06/24 23:01:01 DEBUG|  parallel:0030| BarrierError: timeout waiting for 
>> barrier: start_1
>>
>>
>>
>> 06/24 23:01:01 DEBUG|  parallel:0030| BarrierError: timeout waiting for 
>> barrier: start_1
>>
>> 06/24 22:51:01 DEBUG|base_utils:0077| Running 'killall netserver'
>> 06/24 22:51:01 DEBUG|base_utils:0077| Running 
>> '/ghostcache/autotest/tmp/netperf2/src/src/netserver&'
>> 06/24 22:51:01 DEBUG|base_utils:0114| [stdout] Starting netserver with host 
>> 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC
>> 06/24 22:51:01 INFO |base_barri:0163| tag=start_1 port=11922 timeout=600 
>> hostid=172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0497| masterid: 172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0503| members: 172.25.43.234#netperf-client
>> 06/24 22:51:01 INFO |base_barri:0510| selected as master: self._hostid = 
>> 172.25.43.226#netperf-server , self._masterid = 172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 599
>> 06/24 22:51:01 INFO |base_barri:0163| tag=start_1 port=11922 timeout=600 
>> hostid=172.25.43.234#netperf-client
>> 06/24 22:51:01 INFO |base_barri:0497| masterid: 172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0503| members: 172.25.43.234#netperf-client
>> 06/24 22:51:01 INFO |base_barri:0513| selected as slave
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 599
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 599
>> 06/24 22:51:01 INFO |base_barri:0406| calling master
>> 06/24 22:51:01 INFO |base_barri:0211| new client tag=start_1, 
>> name=172.25.43.234#netperf-client, hostid=172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0246| client now waiting: 
>> 172.25.43.234#netperf-client (172.25.43.226:56000)
>> 06/24 22:51:01 INFO |base_barri:0263| master said: wait
>> 06/24 22:51:01 INFO |base_barri:0281| slave now waiting: 
>> (172.25.43.226:11922)
>> 06/24 22:51:01 INFO |base_barri:0372| master seen 1 of 1
>> 06/24 22:51:01 INFO |base_barri:0430| slave connected to master
>> 06/24 22:51:01 INFO |base_barri:0296| checking client present: 
>> 172.25.43.234#netperf-client
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 599
>> 06/24 22:51:01 INFO |base_barri:0449| master said: ping
>> 06/24 22:51:01 INFO |base_barri:0461| pong
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 19
>> 06/24 22:51:01 INFO |base_barri:0320| Releasing clients
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 19
>> 06/24 22:51:01 INFO |base_barri:0449| master said: rlse
>> 06/24 22:51:01 INFO |base_barri:0344| closing client: 
>> 172.25.43.234#netperf-client
>> 06/24 22:51:01 INFO |base_barri:0470| was released, waiting for close
>> 06/24 22:51:01 INFO |base_barri:0163| tag=stop_1 port=11922 timeout=315 
>> hostid=172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 19
>> 06/24 22:51:01 INFO |base_barri:0497| masterid: 172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0344| closing client: 
>> 172.25.43.234#netperf-client
>> 06/24 22:51:01 INFO |base_barri:0503| members: 172.25.43.234#netperf-client
>> 06/24 22:51:01 DEBUG|base_utils:0077| Running 'sleep 5 && mpstat -P ALL 3 5'
>> 06/24 22:51:01 INFO |base_barri:0510| selected as master: self._hostid = 
>> 172.25.43.226#netperf-server , self._masterid = 172.25.43.226#netperf-server
>> 06/24 22:51:01 INFO |base_barri:0189| seconds remaining: 314
>> 06/24 22:51:01 DEBUG|base_utils:0077| Running 
>> '/ghostcache/autotest/tmp/netperf2/src/src/netperf -H 172.25.43.226 -t 
>> TCP_STREAM -l 15 -s 5 '
>> 06/24 22:51:11 INFO |base_barri:0211| new client tag=start_1, 
>> name=172.25.43.234#netperf-client, hostid=172.25.43.226#netperf-server
>> 06/24 22:51:11 WARNI|base_barri:0220| client arriving for the wrong barrier: 
>> stop_1 != start_1
>> 06/24 22:51:11 INFO |base_barri:0372| master seen 0 of 1
>> 06/24 22:51:11 INFO |base_barri:0189| seconds remaining: 305
>> 06/24 22:51:11 INFO |base_barri:0211| new client tag=start_1, 
>> name=172.25.43.234#netperf-client, hostid=172.25.43.226#netperf-server
>> 06/24 22:51:11 WARNI|base_barri:0220| client arriving for the wrong barrier: 
>> stop_1 != start_1
>> 06/24 22:51:11 INFO |base_barri:0372| master seen 0 of 1
>> 06/24 22:51:11 INFO |base_barri:0189| seconds remaining: 305
>> ...
>> ...
>> ...
>> ...
>> 06/24 22:51:21 INFO |base_barri:0211| new client tag=start_1, 
>> name=172.25.43.234#netperf-client, hostid=172.25.43.226#netperf-server
>> 06/24 22:51:21 WARNI|base_barri:0220| client arriving for the wrong barrier: 
>> stop_1 != start_1
>> 06/24 22:51:21 INFO |base_barri:0372| master seen 0 of 1
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 294
>> 06/24 22:51:21 INFO |  netperf2:0179| [* Command:
>>     sleep 5 && mpstat -P ALL 3 5
>> Exit status: 0
>> Duration: 20.0016210079
>> ...
>> ...
>> ...
>> ...
>> /ghostcache/autotest/tmp/netperf2/src/src/netperf -H 172.25.43.226 -t
>>     TCP_STREAM -l 15 -s 5
>> Exit status: 1
>> Duration: 19.0096800327
>>
>> stdout:
>> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
>> 172.25.43.226 () port 0 AF_INET
>> send_data: data send error: errno 104
>> stderr:
>> netperf: send_omni: send_data failed: Connection reset by peer]
>> 06/24 22:51:21 INFO |base_barri:0211| new client tag=start_1, 
>> name=172.25.43.234#netperf-client, hostid=172.25.43.226#netperf-server
>> 06/24 22:51:21 WARNI|base_barri:0220| client arriving for the wrong barrier: 
>> stop_1 != start_1
>> 06/24 22:51:21 INFO |base_barri:0372| master seen 0 of 1
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 294
>> 06/24 22:51:21 INFO |base_barri:0163| tag=stop_1 port=11922 timeout=300 
>> hostid=172.25.43.234#netperf-client
>> 06/24 22:51:21 INFO |base_barri:0497| masterid: 172.25.43.226#netperf-server
>> 06/24 22:51:21 INFO |base_barri:0503| members: 172.25.43.234#netperf-client
>> 06/24 22:51:21 INFO |base_barri:0513| selected as slave
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 299
>> 06/24 22:51:21 INFO |base_barri:0211| new client tag=start_1, 
>> name=172.25.43.234#netperf-client, hostid=172.25.43.226#netperf-server
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 299
>> 06/24 22:51:21 WARNI|base_barri:0220| client arriving for the wrong barrier: 
>> stop_1 != start_1
>> 06/24 22:51:21 INFO |base_barri:0406| calling master
>> 06/24 22:51:21 INFO |base_barri:0372| master seen 0 of 1
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 294
>> 06/24 22:51:21 INFO |base_barri:0211| new client tag=stop_1, 
>> name=172.25.43.234#netperf-client, hostid=172.25.43.226#netperf-server
>> 06/24 22:51:21 INFO |base_barri:0246| client now waiting: 
>> 172.25.43.234#netperf-client (172.25.43.226:56048)
>> 06/24 22:51:21 INFO |base_barri:0263| master said: wait
>> 06/24 22:51:21 INFO |base_barri:0281| slave now waiting: 
>> (172.25.43.226:11922)
>> 06/24 22:51:21 INFO |base_barri:0372| master seen 1 of 1
>> 06/24 22:51:21 INFO |base_barri:0430| slave connected to master
>> 06/24 22:51:21 INFO |base_barri:0296| checking client present: 
>> 172.25.43.234#netperf-client
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 299
>> 06/24 22:51:21 INFO |base_barri:0449| master said: ping
>> 06/24 22:51:21 INFO |base_barri:0461| pong
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 19
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 19
>> 06/24 22:51:21 INFO |base_barri:0320| Releasing clients
>> 06/24 22:51:21 INFO |base_barri:0449| master said: rlse
>> 06/24 22:51:21 INFO |base_barri:0344| closing client: 
>> 172.25.43.234#netperf-client
>> 06/24 22:51:21 INFO |base_barri:0470| was released, waiting for close
>> 06/24 22:51:21 DEBUG|base_utils:0077| Running 'killall netserver'
>> 06/24 22:51:21 INFO |base_barri:0189| seconds remaining: 19
>> 06/24 22:51:21 INFO |base_barri:0344| closing client: 
>> 172.25.43.234#netperf-client
>> 06/24 22:51:21 INFO | net_utils:0069| Linux 3.10.24-3.10.0.2-amd64-12321334 
>> (a172-25-43-226.deploy.akamaitechnologies.com)      06/24/14        _x86_64_ 
>>        (8 CPU)
>> ...
>> ...
>> ...
>> ...
>>
>>
>>
>>
>> Eventually netperf itself seems to have failed
>>
>>
>> /usr/local/autotest/results/19-debug_user$  cat 
>> ./172.25.43.226/netperf2.client
>> /results/keyval
>> stream_count{attr}=1
>> CPU_C_HI{perf}=0.0
>> CPU_C_SI{perf}=2.30666666667
>> CPU_C_SYS{perf}=20.0366666667
>> CPU_C{perf}=100.0
>> Efficieny_C{perf}=0.0
>> Failed_streams_count{perf}=1
>> INTRS_C{perf}=71.3366666667
>> Throughput{perf}=0
>> actual_time{perf}=15.0036661625
>
>> Any help in understanding these barrier failures will be very helpful.
>> Thanks,
>> Bhupesh
>>
>
>
> --
>                         Amos.

_______________________________________________
Autotest-kernel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/autotest-kernel

Reply via email to