Hi, I did some testing regarding the failure Patick found with the 23 ensemble members. (btw zkconf is a great tool! I haven't seen it yet...)
The exception is happening when the new MultiAddress feature tries to filter the unreachable hosts from the address list. This involves the calling of the InetAddress.isReachable method with a default timeout of 500ms, which goes down to a native call in java and basically try to do a ping (an ICMP echo request) to the host. Naturally, the localhost should be always reachable. For some reason, this call gets timeouted on mac if we have many ensemble members. I tested with 9 members and the cluster started properly. With 11-13-15 members it took more and more time to get the cluster to start, and the "NoRouteToHostException" started to appear in the logs. After around 1 minute the 15 ensemble members cluster started, but obviously this is not good this way. (I also tried with JDK 11 but the I found the same behaviour) On linux, I haven't been able to reproduce the problem. I tried with 5, 9, 15 and 23 ensemble members and the quorum always seems to start properly in a few seconds. (I used OpenJDK 1.8.232 on Ubuntu 18.04) I created a Jira ticket for the issue and try to figure out the fix quickly: https://issues.apache.org/jira/browse/ZOOKEEPER-3698 Kind regards, Mate On Thu, Jan 16, 2020 at 7:47 PM Szalay-Bekő Máté <szalay.beko.m...@gmail.com> wrote: > Yep, ZOOKEEPER-3530 <https://issues.apache.org/jira/browse/ZOOKEEPER-3530> was > me. :) Sorry to not bring this discussion to the mailing list... > > I think having a ...-lib.tgz file generated by maven makes sense, so > people who are using the C client / C libraries can have them built into a > single file on their platform. I think this is also something that apache > bigtop is looking for. This was something that we had in zookeeper 3.4 as > well, but we loose this after the maven migration. > > Actually having this file generated during the build doesn't mean that we > have to upload to any official ftp site. It doesn't need to be part of any > official (or "convenience") file we share during the release procedure. > > Still, you think it is a bad thing to generate this file together with the > other two artifacts, I propose to hide it behind a maven option. So if > someone wish to make this artifact for himself (e.g. we are using it in our > company) then he would still be able to do so, by using a new maven options > like `mvn clean install -Pfull-build -Pgenerate-native-artifact`. > > Kind regards, > Mate > > On Thu, Jan 16, 2020 at 7:10 PM Andor Molnar <an...@apache.org> wrote: > >> >> >> > On 2020. Jan 16., at 18:10, Patrick Hunt <ph...@apache.org> wrote: >> > >> >> 2) “lib” tarball >> >> I think we’ve already talked about releasing C binaries and I had >> always >> >> been against it. These libraries are not portable and unless we release >> >> separate artifacts for all major distributions (including Windows?), I >> >> don’t see the point of introducing it. Plus the things that Patrick >> >> mentioned, I strongly believe that we should remove it from the >> release. >> >> >> >> >> > Sorry if I missed, but was this actually discussed? I don't remember >> seeing >> > it on the mailing list - big shifts like this deserve a community wide >> > discussion thread, and perhaps even a vote, imo. >> >> No worries it wasn’t really a discussion and as far as I remember it >> happened on github. I’ve found the Jira: >> https://issues.apache.org/jira/browse/ZOOKEEPER-3530 >> >> …but can’t find my comment, so it probably happened earlier. >> >> Anyway, ticket has been closed already, Enrico is removing it from this >> release, so if somebody has a very very strong feeling to resurrect the >> topic, feel free to email the @dev list. >> >> Andor >> >> >>