Hi,
With latest nightly builds based on tcpcloud R3.0 contrail-packages
branch, contrail-collector is never coming up properly, being stuck in
"initializing (KafkaPub:192.168.0.111:9092,192.168.0.112:9092,192.168.0.113:9092
connection down)" state. Also, contrail-alarm-gen is not sending
heartbeats to the discovery server.
contrail-collector is logging a lot of messages similar to that:
----->8-----
2016-06-15 Wed 20:06:17:329.724 UTC ostack01 [Thread 140619476842240,
Pid 20]: LOG-4-PROTOERR: 192.168.0.112:9092/1: Protocol parse failure
at rd_kafka_produce_reply_handle:1795
2016-06-15 Wed 20:06:17:329.757 UTC ostack01 [Thread 140619476842240,
Pid 20]: LOG-4-PROTOERR: 192.168.0.112:9092/1: expected 4 bytes > 0
remaining bytes
2016-06-15 Wed 20:06:17:372.639 UTC ostack01 [Thread 140619187439360,
Pid 20]: Message delivery for 0x7fe4480073e0 Local: Bad message format
gen ostack01:Analytics:contrail-query-engine:0
2016-06-15 Wed 20:06:17:372.694 UTC ostack01 [Thread 140619187439360,
Pid 20]: Message delivery for 0x7fe4480073e0 Local: Bad message format
gen ostack01:Analytics:contrail-query-engine:0
2016-06-15 Wed 20:06:17:404.475 UTC ostack01 [Thread 140619502020352,
Pid 20]: LOG-4-PROTOERR: 192.168.0.111:9092/0: Protocol parse failure
at rd_kafka_produce_reply_handle:1795
2016-06-15 Wed 20:06:17:404.519 UTC ostack01 [Thread 140619502020352,
Pid 20]: LOG-4-PROTOERR: 192.168.0.111:9092/0: expected 4 bytes > 0
remaining bytes
2016-06-15 Wed 20:06:17:605.535 UTC ostack01 [Thread 140619485234944,
Pid 20]: LOG-4-PROTOERR: 192.168.0.111:9092/0: Protocol parse failure
at rd_kafka_produce_reply_handle:1795
2016-06-15 Wed 20:06:17:605.568 UTC ostack01 [Thread 140619485234944,
Pid 20]: LOG-4-PROTOERR: 192.168.0.111:9092/0: expected 4 bytes > 0
remaining bytes
2016-06-15 Wed 20:06:17:706.306 UTC ostack01 [Thread 140619485234944,
Pid 20]: LOG-4-PROTOERR: 192.168.0.111:9092/0: Protocol parse failure
at rd_kafka_produce_reply_handle:1795
2016-06-15 Wed 20:06:17:706.334 UTC ostack01 [Thread 140619485234944,
Pid 20]: LOG-4-PROTOERR: 192.168.0.111:9092/0: expected 4 bytes > 0
remaining bytes
----->8-----
and from contrail-alarm-gen logs:
----->8-----
06/15/2016 08:08:35 PM [contrail-alarm-gen]: -uve-2 Reading offset 194
06/15/2016 08:08:35 PM [contrail-alarm-gen]: -uve-2 Ignoring UVE
OffsetAndMessage(offset=194, message=Message(crc=266531059, magic=0,
attributes=0, timestamp=None,
key='ObjectVRouter:ostack01|VrouterStatsAgent|ostack01:Compute:contrail-vrouter-agent:0|192.168.0.111:6379',
value='{}'))
06/15/2016 08:08:35 PM [contrail-alarm-gen]: -uve-2 Reading offset 195
06/15/2016 08:08:35 PM [contrail-alarm-gen]: -uve-2 Ignoring UVE
OffsetAndMessage(offset=195, message=Message(crc=862165380, magic=0,
attributes=0, timestamp=None,
key='ObjectVRouter:ostack03|VrouterStatsAgent|ostack03:Compute:contrail-vrouter-agent:0|192.168.0.111:6379',
value='{}'))
06/15/2016 08:08:35 PM [contrail-alarm-gen]: -uve-2 Reading offset 196
06/15/2016 08:08:35 PM [contrail-alarm-gen]: -uve-2 Ignoring UVE
OffsetAndMessage(offset=196, message=Message(crc=862165380, magic=0,
attributes=0, timestamp=None,
key='ObjectVRouter:ostack03|VrouterStatsAgent|ostack03:Compute:contrail-vrouter-agent:0|192.168.0.111:6379',
value='{}'))
----->8-----
Reading offset number seems to be rising.
When investigating contrail-alarm-gen, I've noticed python process
using 100% of CPU core. It seems it's constantly running this loop:
https://github.com/Juniper/contrail-controller/blob/233878a5bb9dfc08eb1054eb873c2a6d4ff46b04/src/opserver/partition_handler.py#L607
and starving other greenlets, one of them being heartbeat.
I've talked with Jakub Pavlik, and they are seeing the same issue with
their deployments.
_______________________________________________
Dev mailing list
[email protected]
http://lists.opencontrail.org/mailman/listinfo/dev_lists.opencontrail.org