jeremiaswerner commented on issue #2744: Deploy kafka & zookeeper cluster with 
ansible
URL: 
https://github.com/apache/incubator-openwhisk/pull/2744#issuecomment-343194451
 
 
   @devbv Thanks a lot for this very valuable and appreciated contribution. 
I've played around with the change and have left a few minor comments. 
   
   I did the following manual resiliency test on my docker-machine deployment 
with two kafka containers. To get the test run successful I needed to adopt the 
`replicationFactor: 2` (instead of `1`). See my comment below and on the 
setting.
   
   Prereq:
    - Deploy this PR with two kafka nodes `kafka0, kafka1` in the docker-machine
    - Create a simple hello world action
   
   Test:
   1. invoke an action and expect it succeeds -> works as expected ? 
   2. stop kafka0
   3. invoke an action and expect it succeeds -> works as expected ?
   4. wait some time to let the invoker idle
   5. check the invoker logs to see if health pings reach the controller -> 
works as expected ?
   6. stop kafka1
   7. invoke an action and expected it fails -> works as expected ?
   8. start kafka1 and/or kafka0
   9. invoke an action and expect it succeeds again -> works as expected ?
   
   If we don't adjust the `replicationFactor` to `2` the topics seem to be 
distributed across the two nodes of the cluster. That leads to the situation 
where the health topic lives on just a single node and if we kill that node the 
controller would assume all invokers are down because invokers are not able to 
send the health pings to the controller.
   
   For HA and resiliency reasons I therefore vote to make the replicationFactor 
configurable per environment. I'am not 100% sure if the replicationFactor must 
be equal to the number of kafka nodes and/or if the replicationFactor has a 
negative impact on latency and throughput.
   
   I'll therefore further test the latency and throughput in a distributed 
deployment with different replicationFactors and will post the results here as 
datapoints.
   
   I think we need to have a automated test that checks a similar scenario I've 
posted above. You might want to checkout the following example to add an 
automated test `ShootKafka` 
   
https://github.com/apache/incubator-openwhisk/blob/master/tests/src/test/scala/ha/ShootComponentsTests.scala
   
   Again, many thanks for the contribution and I'am looking forward to get this 
in.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to