Re: [ovs-discuss] Upgraded to openvswitch-1.4.1 and still high load and polluted syslog

Volkan YAZICI Thu, 24 May 2012 05:43:34 -0700

Does this problem exist in 1.3.0?


On Thu, 24 May 2012 14:33:44 +0200, Oliver Francke writes:
> Hi Volkan,
>
> thnx for you attention,
>
>
> On 05/24/2012 12:21 PM, Volkan YAZICI wrote:
>> Hi Oliver!
>>
>> I have a triangle shaped network topology of 3 ovs 1.4.1
>> switches. When I plug-in a host into the network I immediatly
>> start to observe the same log pollution and excessive CPU usage
>> similar to yours. I either need to un-plug the machine or break
>> the triangle into a straight line by removing one of the
>> connections between the switches. Have you made any progress on
>> your side about the problem? Any clues?
>
> no progress and no reply, I restarted the ovs-vswitchd with log-file enabled 
> to
> see some more things, but too much json-output.
> And Ben told me, to have far below 10% instead of 40% avg. and spikes for
> several minutes of 100%.
>
> Perhaps I go into the logfile it time permits...
>
> Thnx n regards,
>
> Oliver
>
>>
>>
>> Best.
>>
>> On Thu, 17 May 2012 13:58:00 +0200, Oliver Francke writes:
>>> Hi,
>>>
>>> uhm, I think I have my firewall-provisioning ready for production, but 
>>> still temporary high load of the ovs-vswitchd.
>>>
>>> Anybody with a clue of what's going on there?
>>>
>>> --- 8-<  ---
>>>
>>> May 17 13:54:07 fcmsnode10 ovs-vswitchd: 1844633|poll_loop|WARN|Dropped 771 
>>> log messages in last 1 seconds (most recently, 1 seconds ago) due to 
>>> excessive rate
>>> May 17 13:54:07 fcmsnode10 ovs-vswitchd: 1844634|poll_loop|WARN|wakeup due 
>>> to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at 
>>> lib/dpif-linux.c:1197 (101% CPU usage)
>>> May 17 13:54:07 fcmsnode10 ovs-vswitchd: 1844635|poll_loop|WARN|wakeup due 
>>> to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at 
>>> lib/dpif-linux.c:1197 (101% CPU usage)
>>> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844636|timeval|WARN|105 ms poll 
>>> interval (56 ms user, 44 ms system) is over 152 times the weighted mean 
>>> interval 1 ms (342116319 samples)
>>> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844637|timeval|WARN|context 
>>> switches: 0 voluntary, 2 involuntary
>>> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844638|coverage|INFO|Skipping 
>>> details of duplicate event coverage for hash=959f79a0 in epoch 342116319
>>> May 17 13:54:08 fcmsnode10 ovs-vswitchd: 1844639|poll_loop|WARN|Dropped 880 
>>> log messages in last 1 seconds (most recently, 1 seconds ago) due to 
>>> excessive rate
>>>
>>> --- 8-<  ---
>>>
>>> and ovs-dpctl shows:
>>>
>>> system@vmbr1:
>>>          lookups: hit:269430948 missed:1076470 lost:1
>>>          flows: 6
>>>          port 0: vmbr1 (internal)
>>>          port 1: eth1
>>>          port 4: vlan10 (internal)
>>>          port 5: tap822i1d0
>>>          port 6: tap822i1d1
>>>          port 7: tap410i1d0
>>>          port 9: tap1113i1d0
>>>          port 13: tap433i1d0
>>>          port 15: tap377i1d0
>>>          port 16: tap416i1d0
>>>          port 18: tap287i1d0
>>>          port 19: tap451i1d0
>>>          port 23: tap160i1d0
>>>          port 24: tap376i1d0
>>>          port 27: tap1084i1d0
>>>          port 28: tap1085i1d0
>>>          port 30: tap760i1d0
>>>          port 31: tap339i1d0
>>> system@vmbr0:
>>>          lookups: hit:15321943230 missed:8565995663 lost:201094006
>>>          flows: 15216
>>>          port 0: vmbr0 (internal)
>>>          port 1: vlan146 (internal)
>>>          port 2: eth0
>>> .
>>> .
>>> .
>>>
>>> Hints welcome, cause my fear is, if I put a couple of rules to some 120 
>>> VM's, that there could be performance issues when load is high anyway?
>>>
>>> Kind regards,
>>>
>>> Oliver.
>>>
>>> Am 10.05.2012 um 10:25 schrieb Oliver Francke:
>>>
>>>> Hi *,
>>>>
>>>> as suggested by Ben I upgraded to 1.4.1 and configured with following 
>>>> command:
>>>>
>>>>>           ovs-vsctl set bridge vmbr0 
>>>>> other-config:flow-eviction-threshold=10000
>>>> all 4 nodes, which worked seamlessly. And now to the section with the 
>>>> "but" in it:
>>>>
>>>> But: still high load with polluting syslog with: ( not constantly, but 
>>>> ever so often)
>>>>
>>>> --- 8-<  ---
>>>>
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139643|poll_loop|WARN|wakeup due 
>>>> to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at 
>>>> lib/dpif-linux.c:1197 (52% CPU usage)
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139644|poll_loop|WARN|wakeup due 
>>>> to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at 
>>>> lib/dpif-linux.c:1197 (52% CPU usage)
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139645|poll_loop|WARN|wakeup due 
>>>> to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at 
>>>> lib/dpif-linux.c:1197 (52% CPU usage)
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139646|poll_loop|WARN|wakeup due 
>>>> to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at 
>>>> lib/dpif-linux.c:1197 (52% CPU usage)
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139647|poll_loop|WARN|wakeup due 
>>>> to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at 
>>>> lib/dpif-linux.c:1197 (52% CPU usage)
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139648|timeval|WARN|52 ms poll 
>>>> interval (10 ms user, 10 ms system) is over 19 times the weighted mean 
>>>> interval 3 ms (31293432 samples)
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139649|timeval|WARN|context 
>>>> switches: 0 voluntary, 136 involuntary
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139650|coverage|INFO|Event 
>>>> coverage (epoch 31293432/entire run), hash=0a8403eb:
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139651|coverage|INFO|ofproto_dpif_xlate          30 / 418241133
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139652|coverage|INFO|flow_extract  
>>>>               15 / 119894932
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139653|coverage|INFO|hmap_pathological            6 / 187853714
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139654|coverage|INFO|hmap_expand   
>>>>              275 / 1030432349
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139655|coverage|INFO|netdev_get_stats           117 /   1275136
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139656|coverage|INFO|poll_fd_wait  
>>>>               15 / 469400883
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139657|coverage|INFO|util_xalloc   
>>>>            21740 / 85658737570
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139658|coverage|INFO|netdev_ethtool             234 /   2550522
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139659|coverage|INFO|netlink_received           486 / 444888366
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139660|coverage|INFO|netlink_sent  
>>>>              264 / 361913061
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139661|coverage|INFO|bridge_reconfigure           0 /         6
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139662|coverage|INFO|ofproto_flush 
>>>>                0 /         2
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139663|coverage|INFO|ofproto_update_port          0 /       131
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139664|coverage|INFO|facet_revalidate             0 /    157765
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139665|coverage|INFO|facet_unexpected             0 /         1
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139666|coverage|INFO|dpif_port_add 
>>>>                0 /         2
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139667|coverage|INFO|dpif_port_del 
>>>>                0 /         2
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139668|coverage|INFO|dpif_flow_flush              0 /         4
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139669|coverage|INFO|dpif_flow_put 
>>>>                0 /       445
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139670|coverage|INFO|dpif_flow_del 
>>>>                0 / 119661491
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139671|coverage|INFO|dpif_purge    
>>>>                0 /         2
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139672|coverage|INFO|mac_learning_learned         0 /      6111
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139673|coverage|INFO|mac_learning_expired         0 /      5598
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139674|coverage|INFO|poll_zero_timeout            0 /      6190
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139675|coverage|INFO|pstream_open  
>>>>                0 /         4
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139676|coverage|INFO|stream_open   
>>>>                0 /         1
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139677|coverage|INFO|netdev_set_policing          0 /       706
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139678|coverage|INFO|netdev_get_ifindex           0 /       123
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139679|coverage|INFO|netdev_get_hwaddr            0 /       125
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139680|coverage|INFO|nln_changed   
>>>>                0 /       137
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 
>>>> 139681|coverage|INFO|netlink_recv_jumbo           0 /  16397628
>>>> May  8 06:30:15 fcmsnode0 ovs-vswitchd: 139682|coverage|INFO|47 events 
>>>> never hit
>>>> May  8 06:30:16 fcmsnode0 ovs-vswitchd: 139683|poll_loop|WARN|Dropped 216 
>>>> log messages in last 1 seconds (most recently, 1 seconds ago) due to 
>>>> excessive rate
>>>> May  8 06:30:16 fcmsnode0 ovs-vswitchd: 139684|poll_loop|WARN|wakeup due 
>>>> to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at 
>>>> lib/dpif-linux.c:1197 (52% CPU usage)
>>>> May  8 06:30:16 fcmsnode0 ovs-vswitchd: 139685|poll_loop|WARN|wakeup due 
>>>> to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at 
>>>> lib/dpif-linux.c:1197 (52% CPU usage)
>>>> May  8 06:30:17 fcmsnode0 ovs-vswitchd: 139686|poll_loop|WARN|Dropped 480 
>>>> log messages in last 1 seconds (most recently, 1 seconds ago) due to 
>>>> excessive rate
>>>>
>>>> --- 8-<  ---
>>>>
>>>> Perhaps there is already a hint in the stats... If not, how to dig into it 
>>>> further...?
>>>>
>>>> Kind regards,
>>>>
>>>> Oliver.
>>>>
>>>> -- 
>>>>
>>>> Oliver Francke
>>>>
>>>> filoo GmbH
>>>> Moltkestraße 25a
>>>> 33330 Gütersloh
>>>> HRB4355 AG Gütersloh
>>>>
>>>> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
>>>>
>>>> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
>>>>
>>>> _______________________________________________
>>>> discuss mailing list
>>>> [email protected]
>>>> http://openvswitch.org/mailman/listinfo/discuss
>>> _______________________________________________
>>> discuss mailing list
>>> [email protected]
>>> http://openvswitch.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss

Re: [ovs-discuss] Upgraded to openvswitch-1.4.1 and still high load and polluted syslog

Reply via email to