Hi BIRD team!

We found a case when BMP code is trying to connect with BMP collector service 
with sk_open(), this causes increasing CPU utilization. To reproduce this case, 
you have just:

  1.  Server machine where BMP PDU packets will be sent, should be reachable 
(so it can be pinged).
  2.  BMP collector service itself should not be running on this server.
  3.  Run BIRD with enabled BMP protocol.

After that you should observe that BIRD process has significantly increased CPU 
utilization. This is related somehow with “BIRD socket” because when I capture 
network traffic on host machine (where BIRD is running), I can see massive 
amount of TCP packets which are exchange between BIRD host machine and BMP 
collector machine. At the moment socket type related with BMP connection is 
SK_TCP_ACTIVE.
Do you have any idea what is going wrong or how BIRD socket should be properly 
use?
As a temporary fix, I have provided patch allows to avoid this issue but it is 
very ugly hack because it frees BIRD socket outside of IO code (sk_free()) and 
initialize again socket again every time when ECONNREFUSED error is passing to 
err_hook callback.

I need also a tip if there is a way to get notification from BIRD socket if we 
lost connection with BMP collector service? One option is to check if sk_send() 
failed but what in situation when there are no updates to send by longer time 
and I would like to get a notification ASAP when I lost connection with BMP 
collector service. Is this possible with current BIRD implementation or I 
should to add some timer's callback which will check somehow if BMP collector 
service is alive? This mechanism is needed for me to synchronize/re-send all 
BMP data to the collector.

Currently we have switched to BMP code provided on bmp branch from gitlab BIRD 
repo.

Additionally I have a question referring to enclosed code. Can I free list node 
and node data itself when sk_send() returns value greater or equal to 0 (>= 0), 
like in the below code?

  WALK_LIST_DELSAFE(tx_data, tx_data_next, p->tx_queue)
  {
    ...
    rv = sk_send(p->sk, data_size);
    if (rv < 0) {
      return;
    }

    mb_free(tx_data->data);
    rem_node((node *) tx_data);
    mb_free(tx_data);
    if (rv == 0) {
      return;
    }
    ...

Or I should to do that only if sk_send() return value greater than 0 (> 0) ? My 
goal is sending all data from list if there was only "temporary" problem with 
sk_send().


Thanks,
----


Pawel Maslanka
Senior Software Engineer

[signature_1256476543]


Office: +1.617.444.1234
Cell: +1.617.444.1234

Akamai Technologies
150 Broadway
Cambridge, MA 02142


Connect with Us:

[signature_580743884]<https://community.akamai.com/> [signature_1866338322] 
<http://blogs.akamai.com/>  [signature_2113959087] <https://twitter.com/akamai> 
 [signature_447607273] <http://www.facebook.com/AkamaiTechnologies>  
[signature_1901210113] <http://www.linkedin.com/company/akamai-technologies>  
[signature_1973184621] 
<http://www.youtube.com/user/akamaitechnologies?feature=results_main>



Attachment: bmp_connect_failed.patch
Description: bmp_connect_failed.patch

Reply via email to