[
https://issues.apache.org/jira/browse/KAFKA-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikaël Cluseau updated KAFKA-2426:
----------------------------------
Description:
Hi,
when used behind a firewall, Apache Kafka nodes are trying to connect to
themselves using their advertised hostnames. This means that if you have a
service IP managed by the docker's host using *only* iptables DNAT rules, the
node's connection to "itself" times out.
This is the case in any setup where a host will DNAT the service IP to the
instance's IP, and send the packet back on the same interface other a Linux
Bridge port not configured in "hairpin" mode. It's because of this:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_forward.c#n30
The specific part of the kubernetes issue is here:
https://github.com/BenTheElder/kubernetes/issues/3#issuecomment-123925060 .
The timeout involves that the even if partition's leader is elected, it then
fails to accept writes from the other members, causing a write lock. and
generating very heavy logs (as fast as Kafka usualy is, but through log4j this
time ;)).
This also means that the normal docker case work by going through the
userspace-proxy, which necessarily impacts the performance.
The workaround for us was to add a "127.0.0.2 advertised-hostname" to
/etc/hosts in the container startup script.
was:
Hi,
when used behind a firewall, Apache Kafka nodes are trying to connect to
themselves using their advertised hostnames. This means that if you have a
service IP managed by the docker's host using *only* iptables DNAT rules, the
node's connection to "itself" times out.
This is the case in any setup where a host will DNAT the service IP to the
instance's IP, and send the packet back on the same interface other a Linux
Bridge port not configured in "hairpin" mode. It's because of this:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_forward.c#n30
The specific part of the kubernetes issue is here:
https://github.com/BenTheElder/kubernetes/issues/3#issuecomment-123925060 .
The timeout involves that the even if partition's leader is elected, it then
fails to accept writes from the other members, causing a write lock. and
generating very heavy logs (as fast as Kafka usualy is, but through log4j this
time ;)).
This also means that the normal docker case work by going through the
userspace-proxy, which necessarily impacts the performance.
The workaround for us was to add a "127.0.0.2 {advertised hostname}" to
/etc/hosts in the container startup script.
> A Kafka node tries to connect to itself through its advertised hostname
> -----------------------------------------------------------------------
>
> Key: KAFKA-2426
> URL: https://issues.apache.org/jira/browse/KAFKA-2426
> Project: Kafka
> Issue Type: Bug
> Components: network
> Affects Versions: 0.8.2.1
> Environment: Docker https://github.com/wurstmeister/kafka-docker,
> managed by a Kubernetes cluster, with an "iptables proxy".
> Reporter: Mikaël Cluseau
> Assignee: Jun Rao
>
> Hi,
> when used behind a firewall, Apache Kafka nodes are trying to connect to
> themselves using their advertised hostnames. This means that if you have a
> service IP managed by the docker's host using *only* iptables DNAT rules, the
> node's connection to "itself" times out.
> This is the case in any setup where a host will DNAT the service IP to the
> instance's IP, and send the packet back on the same interface other a Linux
> Bridge port not configured in "hairpin" mode. It's because of this:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_forward.c#n30
> The specific part of the kubernetes issue is here:
> https://github.com/BenTheElder/kubernetes/issues/3#issuecomment-123925060 .
> The timeout involves that the even if partition's leader is elected, it then
> fails to accept writes from the other members, causing a write lock. and
> generating very heavy logs (as fast as Kafka usualy is, but through log4j
> this time ;)).
> This also means that the normal docker case work by going through the
> userspace-proxy, which necessarily impacts the performance.
> The workaround for us was to add a "127.0.0.2 advertised-hostname" to
> /etc/hosts in the container startup script.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)