Hi, Let me summarize and give a quick update on the outstanding issues for 3.5 GA:
- ZOOKEEPER-1818 (Fix don't care for trunk) - ZOOKEEPER-2778 (Potential server deadlock between follower sync with leader and follower receiving external connection requests.) - ZOOKEEPER-3021 Migrate project structure to Maven (ongoing) - ZOOKEEPER-925 Docs generation to Maven - ZOOKEEPER-3104 (waiting for backport) - ZOOKEEPER-3125 (waiting for backport PR #647) The 2 Maven related tickets are no-brainers as well as the backports. ZK-2778 has been picked up by Maoling (thanks!) as far as I can see, ZK-1818 is the only one waiting for a volunteer. Please correct me if I’ve missed something. Regards, Andor > On 2018. Sep 28., at 18:32, Tamas Penzes <tam...@cloudera.com.INVALID> wrote: > > Hi All, > > I would add ZOOKEEPER-3021 > <https://issues.apache.org/jira/browse/ZOOKEEPER-3021> Migrate project > structure to Maven build as a blocker too. Since the migration has started > it would be good to finish before releasing ZK 3.5.x GA. > > ZOOKEEPER-925 <https://issues.apache.org/jira/browse/ZOOKEEPER-925> replace > our forrest site and documentation generation might also be a good idea, > since then we could deliver the new MarkDown based documentation. > > Regards, Tamaas > > On Fri, Sep 14, 2018 at 10:09 AM Fangmin Lv <lvfang...@gmail.com> wrote: > >> Oh, sorry for the confusion, I should provide more context. >> >> Leader will use on disk txn sync with followers to if the peer zxid is not >> in it's in memory commit logs, the code is here: Leader on disk txn sync >> < >> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L774 >>> . >> There is bug that potentially there will be gap in the txn files, like >> after snap sync, etc, so it's possible the peer will miss txns due to this. >> >> The option to disable it is snapshotSizeFactor >> < >> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZKDatabase.java#L81 >>> , >> set it to -1 will disable this feature. On 3.5, it's better to have a PR to >> set this to -1 by default. It might have more SNAP sync, but from our prod >> it doesn't seem to be a big problem to me. >> >> I can send out the diff to disable it by default on 3.5 if you guys think >> this is the right way to do. >> >> Thanks, >> Fangmin >> >> On Thu, Sep 13, 2018 at 1:58 AM Andor Molnar <an...@apache.org> wrote: >> >>> What’s needed to turn it off? >>> Do we need a PR or it’s just a config option? >>> Shall we implement a feature switch for that and turn it off by default? >>> >>> Sorry I don’t have too much insight on disk txn sync. >>> >>> Andor >>> >>> >>> >>>> On 2018. Sep 13., at 9:16, Fangmin Lv <lvfang...@gmail.com> wrote: >>>> >>>> And to be clear, ZOOKEEPER-2418 is actually just one case of >>> inconsistency >>>> which could caused by on disk txn sync, as I mentioned in a newer JIRA >>>> ZOOKEEPER-2846 <https://issues.apache.org/jira/browse/ZOOKEEPER-2846>, >>> the >>>> snap sync or txn sync could also leave txns gap in the txn file, which >>> is a >>>> more common case could trigger this issue. >>>> >>>> I would suggest to turn off the on disk txn sync by default for now to >>>> avoid this issue, after we finished ZOOKEEPER-3114, we can use that to >>>> validate the on disk txns during syncing. >>>> >>>> Thanks, >>>> Fangmin >>>> >>>> On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfang...@gmail.com> >> wrote: >>>> >>>>> Andor, >>>>> >>>>> ZOOKEEPER-3114 is about adding real time digest checking to help >>> detecting >>>>> inconsistency, it's a new feature with amounts of code change. I'll >>> start >>>>> upstream it part by part, but I don't expect it's being merged in the >>> next >>>>> few weeks. So yes, it's a nice to have, but definitely not a block for >>> 3.5. >>>>> >>>>> Thanks, >>>>> Fangmin >>>>> >>>>> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <an...@apache.org> >> wrote: >>>>> >>>>>> Fangmin, >>>>>> >>>>>> Sorry, I just noticed that you want to include the consistency fixes >> in >>>>>> the stable version which is fine. Let’s finish the backports and >> we’ll >>> be >>>>>> done with them. >>>>>> >>>>>> ZOOKEEPER-3114 is essentially a new feature, I wouldn’t block 3.5 >> with >>>>>> that. What do you think? >>>>>> >>>>>> Andor >>>>>> >>>>>> >>>>>> >>>>>>> On 2018. Sep 12., at 11:52, Andor Molnar <an...@apache.org> wrote: >>>>>>> >>>>>>> Cool, thanks for the clarification. >>>>>>> >>>>>>> The updated list is as follows: >>>>>>> >>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) >>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync >> with >>>>>> leader and follower receiving external connection requests.) >>>>>>> >>>>>>> The following are not critical and no blockers for the stable >> release: >>>>>>> >>>>>>> Waiting for to be ported to 3.5: >>>>>>> - ZOOKEEPER-3104 >>>>>>> - ZOOKEEPER-3125 >>>>>>> - ZOOKEEPER-3127 >>>>>>> >>>>>>> New feature: >>>>>>> - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too) >>>>>>> >>>>>>> Regards, >>>>>>> Andor >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 2018. Sep 12., at 0:42, Fangmin Lv <lvfang...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hi Andor, >>>>>>>> >>>>>>>> That's the on disk txn feature, which was disabled internally after >>> we >>>>>>>> found the potentially inconsistent issue. The only solution we have >>>>>> for now >>>>>>>> is waiting for the new digest checking feature I mentioned in >>>>>>>> ZOOKEEPER-3114. >>>>>>>> >>>>>>>> I think there are some other critical consistent issues we just >> fixed >>>>>> on >>>>>>>> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125, ZOOKEEPER-3127, I >>>>>> think we >>>>>>>> should include that in the official 3.5 release as well. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Fangmin >>>>>>>> >>>>>>>> On Tue, Sep 11, 2018 at 11:58 AM Andor Molnár <an...@apache.org> >>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Jeelani, >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for letting me know. I'm happy to remove it from the list >> to >>>>>> get >>>>>>>>> closer to a stable release. :) >>>>>>>>> >>>>>>>>> What's the feature which can be disabled to avoid data >>> inconsistency? >>>>>>>>> >>>>>>>>> >>>>>>>>> Andor >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 09/10/2018 11:33 PM, Mohamed Jeelani wrote: >>>>>>>>>> Thanks Andor for compiling this. Should we be ignoring >>>>>> ZOOKEEPER-2418 as >>>>>>>>> well? This exists in 3.4 as well and the feature can be disabled. >> We >>>>>> are >>>>>>>>> working on a longer term fix for it in 3.6. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Jeelani >>>>>>>>>> >>>>>>>>>> On 9/10/18, 5:19 AM, "Andor Molnar" <an...@cloudera.com.INVALID >>> >>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Fine. >>>>>>>>>> >>>>>>>>>> I'm happy to ignore 1549, 2846 and 2930. Still we have the list >>> of: >>>>>>>>>> >>>>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) >>>>>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >>>>>>>>>> - ZOOKEEPER-2418 (txnlog diff sync can skip sending some >>>>>>>>> transactions to >>>>>>>>>> followers) >>>>>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower >> sync >>>>>>>>> with >>>>>>>>>> leader and follower receiving external connection requests.) >>>>>>>>>> >>>>>>>>>> SSL (ZK-236) is a feature which essential for the 3.5 release, >>>>>> hence >>>>>>>>> I >>>>>>>>>> wouldn't leave it out or postpone it for the next stable >> release. >>>>>> PR >>>>>>>>> has >>>>>>>>>> been out for a long time, get on reviewing please. >>>>>>>>>> The rest are also long outstanding issues which have been found >> in >>>>>>>>> the 3.5 >>>>>>>>>> branch. >>>>>>>>>> ZK-1818 is something which was found in 3.4 and fixed in 3.4, >> but >>>>>>>>> never has >>>>>>>>>> been fixed in 3.5. Quite a serious issue if still present. >>>>>>>>>> >>>>>>>>>> I think we should at least run some manual testing and see if we >>>>>>>>> could >>>>>>>>>> repro any of these issues before going ahead with a stable >>> release. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Andor >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <h...@apache.org> >>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I haven't went through the entire list, but looks like lots of >> the >>>>>>>>> JIRA >>>>>>>>>>> issues listed in this thread, such as ZOOKEEPER-1549, 2846, also >>>>>>>>> affects >>>>>>>>>>> 3.4 releases. Should we scope these issues out? >>>>>>>>>>> >>>>>>>>>>> I think historically the single outstanding blocking issue for a >>>>>>>>> stable 3.5 >>>>>>>>>>> release is the reconfig feature and security concerns around it >>>>>>>>> (somehow >>>>>>>>>>> addressed in ZOOKEEPER-2014), and the alpha and beta releases >> were >>>>>>>>> created >>>>>>>>>>> to stabilize that feature. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>> >>> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e= >>>>>>>>>>> SSL-release-date-tt7581744.html >>>>>>>>>>> >>>>>>>>>>> So it looks like we are in good shape to release. Something >> might >>>>>>>>> worth >>>>>>>>>>> doing to claim the quality of 3.5 is on par with 3.4 >>>>>>>>>>> >>>>>>>>>>> * Run Jepsen on 3.5 - 3.4 passed the test for the record >>>>>>>>>>> >>>>>>>>> >>>>>> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e= >>>>>>>>>>> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky tests >> at >>>>>>>>> all. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar >>>>>>>>> <an...@cloudera.com.invalid> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Maoling! That would be huge help, I appreciate it. >>>>>>>>>>>> >>>>>>>>>>>> Andor >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>>> >>> >>>