Thanks Brent for raising this concern! Previously, we were not aware of this issue of ZK level backward incompatibility.
I think you can submit the log4j patch to the 1.0.2 branch in Apache Helix to make it a hotfix. But I am not sure whether we can do a release for that as long as there is no build number version in Apache Helix. I added to the dev list to see whether there are any other suggestions for this scenario or not. Best, Junkai On Mon, Jul 18, 2022 at 3:34 PM Brent <brentwritesc...@gmail.com> wrote: > Hey Helix folks, > > We ran into a fun issue recently. Between the time that Apache Helix > v1.0.3 was released on April 14 and v1.0.4 was recently on June 9, it looks > like a backward-incompatible change may have been introduced on June 3rd > that makes Helix v1.0.4 not work correctly on Zookeeper 3.4.x clusters. > > I do acknowledge that Zookeeper 3.4.x was end-of-lifed on June 1st 2020 ( > https://lists.apache.org/thread/xckr6nnsg9rxchkbvltkvt7hr2d0mhbo), so > obviously that certainly factors in, but it's what our organizational team > is supporting. So unfortunately we're stuck between a rock and a hard > place at the moment: > - We can't go back to v1.0.2 because it lacks the Log4j fixes > - We can't use v1.0.3 due to the corruption issue > - We can't move ahead to v1.0.4 due to the compatibility issue with > Zookeeper > I have a fork we were previously using ( > https://github.com/brentwritescode/helix/releases/tag/1.0.2-with-log4j-2.17.1), > but that's not a long-term solution either. > > The issue is a bit subtle. From v1.0.2 to v1.0.3, the > org.apache.zookeeper version requirement in the helix/zookeeper-api was > bumped from 3.14.13 to 3.5.9: > - v1.0.2: > https://github.com/apache/helix/blob/c219050f8dc02c25451493f96575b56fabbf2c1e/zookeeper-api/pom.xml#L58 > - v1.0.3: > https://github.com/apache/helix/blob/46b705f7d47990fa7bf1feeb6c64457e3d80af22/zookeeper-api/pom.xml#L54 > So that, in and of itself, was not breaking. > > And then from v1.0.3 to v1.0.4, some code changes were introduced in this > PR (https://github.com/apache/helix/pull/2138/files) that relied > specifically on that 3.5.x Zookeeper version. For example, the "import > org.apache.zookeeper.AsyncCallback.Create2Callback" that was added to > "helix/zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/callback/ZkAsyncCallbacks.java" > in that PR introduces a backward incompatible change. > > So the net result is that, unfortunately, there has been a drift over the > past two versions (from v1.0.2 to v1.0.4) that has rendered Zookeeper 3.4.x > clusters incompatible with Apache Helix. > > I wanted to post this here: > > 1. To see if you were all aware of it (since it may hit other customers > as well and we were a bit blind-sided by it) > 2. To see if you had any ideas on how to work with/around this > > Our long-term plan will obviously be to get on newer Zookeeper clusters as > we can, but that's likely not going to be a quick turn-around for us. In > the short-term we'll need to revert back to our v1.0.2 fork. > > Does the team happen to have any other comments or suggestions on dealing > with this issue? Is this correctable at the project level (I suspect that > will be tough)? > > Thanks much! > > ~Brent >