jeremiaswerner commented on issue #2744: Deploy kafka & zookeeper cluster with ansible URL: https://github.com/apache/incubator-openwhisk/pull/2744#issuecomment-343194451 @devbv Thanks a lot for this very valuable and appreciated contribution. I've played around with the change and have left a few minor comments. I did the following manual resiliency test on my docker-machine deployment with two kafka containers. To get the test run successful I needed to adopt the `replicationFactor: 2` (instead of `1`). See my comment below and on the setting. Prereq: - Deploy this PR with two kafka nodes `kafka0, kafka1` in the docker-machine - Create a simple hello world action Test: 1. invoke an action and expect it succeeds -> works as expected ? 2. stop kafka0 3. invoke an action and expect it succeeds -> works as expected ? 4. wait some time to let the invoker idle 5. check the invoker logs to see if health pings reach the controller -> works as expected ? 6. stop kafka1 7. invoke an action and expected it fails -> works as expected ? 8. start kafka1 and/or kafka0 9. invoke an action and expect it succeeds again -> works as expected ? If we don't adjust the `replicationFactor` to `2` the topics seem to be distributed across the two nodes of the cluster. That leads to the situation where the health topic lives on just a single node and if we kill that node the controller would assume all invokers are down because invokers are not able to send the health pings to the controller anymore. For HA and resiliency reasons I therefore vote to make the replicationFactor configurable per environment. I'am not 100% sure if the replicationFactor must be equal to the number of kafka nodes and/or if the replicationFactor has a negative impact on latency and throughput. I'll therefore further test the latency and throughput in a distributed deployment with different replicationFactors and will post the results here as datapoints. I think we need to have a automated test that checks a similar scenario I've posted above. You might want to checkout the following example to add an automated test `ShootKafka` https://github.com/apache/incubator-openwhisk/blob/master/tests/src/test/scala/ha/ShootComponentsTests.scala Again, many thanks for the contribution and I'am looking forward to get this in.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
