Hi folks, You’ve probably realised lots of update emails coming from Jira. Please be aware that we’ve updated a bunch of open blocker/critical 3.5 tickets to reflect to what we discussed in this email.
If you open up the following jira filter: project = ZooKeeper and resolution = Unresolved and fixVersion = 3.5.5 AND priority in (blocker, critical) ORDER BY priority DESC, key ASC You’ll see the most up-to-date list of tickets which need to be addressed before the stable 3.5 release. Thank you for your efforts to get this done. Fangmin, ZK-3104 is waiting for backport, but ticket has already been resolved. Have you created a separate ticket for the backport or shall I just reopen it with the right fix versions? Thanks, Andor > On 2018. Oct 8., at 12:34, Andor Molnar <an...@apache.org> wrote: > > Hi, > > Let me summarize and give a quick update on the outstanding issues for 3.5 GA: > > - ZOOKEEPER-1818 (Fix don't care for trunk) > - ZOOKEEPER-2778 (Potential server deadlock between follower sync with leader > and follower receiving external connection requests.) > - ZOOKEEPER-3021 Migrate project structure to Maven (ongoing) > - ZOOKEEPER-925 Docs generation to Maven > - ZOOKEEPER-3104 (waiting for backport) > - ZOOKEEPER-3125 (waiting for backport PR #647) > > The 2 Maven related tickets are no-brainers as well as the backports. ZK-2778 > has been picked up by Maoling (thanks!) as far as I can see, ZK-1818 is the > only one waiting for a volunteer. > > Please correct me if I’ve missed something. > > Regards, > Andor > > > > >> On 2018. Sep 28., at 18:32, Tamas Penzes <tam...@cloudera.com.INVALID> wrote: >> >> Hi All, >> >> I would add ZOOKEEPER-3021 >> <https://issues.apache.org/jira/browse/ZOOKEEPER-3021> Migrate project >> structure to Maven build as a blocker too. Since the migration has started >> it would be good to finish before releasing ZK 3.5.x GA. >> >> ZOOKEEPER-925 <https://issues.apache.org/jira/browse/ZOOKEEPER-925> replace >> our forrest site and documentation generation might also be a good idea, >> since then we could deliver the new MarkDown based documentation. >> >> Regards, Tamaas >> >> On Fri, Sep 14, 2018 at 10:09 AM Fangmin Lv <lvfang...@gmail.com> wrote: >> >>> Oh, sorry for the confusion, I should provide more context. >>> >>> Leader will use on disk txn sync with followers to if the peer zxid is not >>> in it's in memory commit logs, the code is here: Leader on disk txn sync >>> < >>> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java#L774 >>>> . >>> There is bug that potentially there will be gap in the txn files, like >>> after snap sync, etc, so it's possible the peer will miss txns due to this. >>> >>> The option to disable it is snapshotSizeFactor >>> < >>> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZKDatabase.java#L81 >>>> , >>> set it to -1 will disable this feature. On 3.5, it's better to have a PR to >>> set this to -1 by default. It might have more SNAP sync, but from our prod >>> it doesn't seem to be a big problem to me. >>> >>> I can send out the diff to disable it by default on 3.5 if you guys think >>> this is the right way to do. >>> >>> Thanks, >>> Fangmin >>> >>> On Thu, Sep 13, 2018 at 1:58 AM Andor Molnar <an...@apache.org> wrote: >>> >>>> What’s needed to turn it off? >>>> Do we need a PR or it’s just a config option? >>>> Shall we implement a feature switch for that and turn it off by default? >>>> >>>> Sorry I don’t have too much insight on disk txn sync. >>>> >>>> Andor >>>> >>>> >>>> >>>>> On 2018. Sep 13., at 9:16, Fangmin Lv <lvfang...@gmail.com> wrote: >>>>> >>>>> And to be clear, ZOOKEEPER-2418 is actually just one case of >>>> inconsistency >>>>> which could caused by on disk txn sync, as I mentioned in a newer JIRA >>>>> ZOOKEEPER-2846 <https://issues.apache.org/jira/browse/ZOOKEEPER-2846>, >>>> the >>>>> snap sync or txn sync could also leave txns gap in the txn file, which >>>> is a >>>>> more common case could trigger this issue. >>>>> >>>>> I would suggest to turn off the on disk txn sync by default for now to >>>>> avoid this issue, after we finished ZOOKEEPER-3114, we can use that to >>>>> validate the on disk txns during syncing. >>>>> >>>>> Thanks, >>>>> Fangmin >>>>> >>>>> On Wed, Sep 12, 2018 at 9:55 AM Fangmin Lv <lvfang...@gmail.com> >>> wrote: >>>>> >>>>>> Andor, >>>>>> >>>>>> ZOOKEEPER-3114 is about adding real time digest checking to help >>>> detecting >>>>>> inconsistency, it's a new feature with amounts of code change. I'll >>>> start >>>>>> upstream it part by part, but I don't expect it's being merged in the >>>> next >>>>>> few weeks. So yes, it's a nice to have, but definitely not a block for >>>> 3.5. >>>>>> >>>>>> Thanks, >>>>>> Fangmin >>>>>> >>>>>> On Wed, Sep 12, 2018 at 2:55 AM Andor Molnar <an...@apache.org> >>> wrote: >>>>>> >>>>>>> Fangmin, >>>>>>> >>>>>>> Sorry, I just noticed that you want to include the consistency fixes >>> in >>>>>>> the stable version which is fine. Let’s finish the backports and >>> we’ll >>>> be >>>>>>> done with them. >>>>>>> >>>>>>> ZOOKEEPER-3114 is essentially a new feature, I wouldn’t block 3.5 >>> with >>>>>>> that. What do you think? >>>>>>> >>>>>>> Andor >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 2018. Sep 12., at 11:52, Andor Molnar <an...@apache.org> wrote: >>>>>>>> >>>>>>>> Cool, thanks for the clarification. >>>>>>>> >>>>>>>> The updated list is as follows: >>>>>>>> >>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) >>>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >>>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower sync >>> with >>>>>>> leader and follower receiving external connection requests.) >>>>>>>> >>>>>>>> The following are not critical and no blockers for the stable >>> release: >>>>>>>> >>>>>>>> Waiting for to be ported to 3.5: >>>>>>>> - ZOOKEEPER-3104 >>>>>>>> - ZOOKEEPER-3125 >>>>>>>> - ZOOKEEPER-3127 >>>>>>>> >>>>>>>> New feature: >>>>>>>> - ZOOKEEPER-3114 (fixes ZOOKEEPER-2184 too) >>>>>>>> >>>>>>>> Regards, >>>>>>>> Andor >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On 2018. Sep 12., at 0:42, Fangmin Lv <lvfang...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hi Andor, >>>>>>>>> >>>>>>>>> That's the on disk txn feature, which was disabled internally after >>>> we >>>>>>>>> found the potentially inconsistent issue. The only solution we have >>>>>>> for now >>>>>>>>> is waiting for the new digest checking feature I mentioned in >>>>>>>>> ZOOKEEPER-3114. >>>>>>>>> >>>>>>>>> I think there are some other critical consistent issues we just >>> fixed >>>>>>> on >>>>>>>>> master recently: ZOOKEEPER-3104, ZOOKEEPER-3125, ZOOKEEPER-3127, I >>>>>>> think we >>>>>>>>> should include that in the official 3.5 release as well. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Fangmin >>>>>>>>> >>>>>>>>> On Tue, Sep 11, 2018 at 11:58 AM Andor Molnár <an...@apache.org> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Jeelani, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks for letting me know. I'm happy to remove it from the list >>> to >>>>>>> get >>>>>>>>>> closer to a stable release. :) >>>>>>>>>> >>>>>>>>>> What's the feature which can be disabled to avoid data >>>> inconsistency? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Andor >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 09/10/2018 11:33 PM, Mohamed Jeelani wrote: >>>>>>>>>>> Thanks Andor for compiling this. Should we be ignoring >>>>>>> ZOOKEEPER-2418 as >>>>>>>>>> well? This exists in 3.4 as well and the feature can be disabled. >>> We >>>>>>> are >>>>>>>>>> working on a longer term fix for it in 3.6. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> Jeelani >>>>>>>>>>> >>>>>>>>>>> On 9/10/18, 5:19 AM, "Andor Molnar" <an...@cloudera.com.INVALID >>>> >>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Fine. >>>>>>>>>>> >>>>>>>>>>> I'm happy to ignore 1549, 2846 and 2930. Still we have the list >>>> of: >>>>>>>>>>> >>>>>>>>>>> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol) >>>>>>>>>>> - ZOOKEEPER-1818 (Fix don't care for trunk) >>>>>>>>>>> - ZOOKEEPER-2418 (txnlog diff sync can skip sending some >>>>>>>>>> transactions to >>>>>>>>>>> followers) >>>>>>>>>>> - ZOOKEEPER-2778 (Potential server deadlock between follower >>> sync >>>>>>>>>> with >>>>>>>>>>> leader and follower receiving external connection requests.) >>>>>>>>>>> >>>>>>>>>>> SSL (ZK-236) is a feature which essential for the 3.5 release, >>>>>>> hence >>>>>>>>>> I >>>>>>>>>>> wouldn't leave it out or postpone it for the next stable >>> release. >>>>>>> PR >>>>>>>>>> has >>>>>>>>>>> been out for a long time, get on reviewing please. >>>>>>>>>>> The rest are also long outstanding issues which have been found >>> in >>>>>>>>>> the 3.5 >>>>>>>>>>> branch. >>>>>>>>>>> ZK-1818 is something which was found in 3.4 and fixed in 3.4, >>> but >>>>>>>>>> never has >>>>>>>>>>> been fixed in 3.5. Quite a serious issue if still present. >>>>>>>>>>> >>>>>>>>>>> I think we should at least run some manual testing and see if we >>>>>>>>>> could >>>>>>>>>>> repro any of these issues before going ahead with a stable >>>> release. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Andor >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <h...@apache.org> >>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I haven't went through the entire list, but looks like lots of >>> the >>>>>>>>>> JIRA >>>>>>>>>>>> issues listed in this thread, such as ZOOKEEPER-1549, 2846, also >>>>>>>>>> affects >>>>>>>>>>>> 3.4 releases. Should we scope these issues out? >>>>>>>>>>>> >>>>>>>>>>>> I think historically the single outstanding blocking issue for a >>>>>>>>>> stable 3.5 >>>>>>>>>>>> release is the reconfig feature and security concerns around it >>>>>>>>>> (somehow >>>>>>>>>>>> addressed in ZOOKEEPER-2014), and the alpha and beta releases >>> were >>>>>>>>>> created >>>>>>>>>>>> to stabilize that feature. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__zookeeper-2Duser.578899.n2.nabble.com_Zookeeper-2Dwith-2D&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=_tGtL3nMWtuPrXKXDx27AIWOzyyT7W-CjIVLDFZwT0E&e= >>>>>>>>>>>> SSL-release-date-tt7581744.html >>>>>>>>>>>> >>>>>>>>>>>> So it looks like we are in good shape to release. Something >>> might >>>>>>>>>> worth >>>>>>>>>>>> doing to claim the quality of 3.5 is on par with 3.4 >>>>>>>>>>>> >>>>>>>>>>>> * Run Jepsen on 3.5 - 3.4 passed the test for the record >>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__aphyr.com_posts_291-2Djepsen-2Dzookeeper&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=Vl4oKanLQehvaulUvoKg8A&m=wqlhnot9c-pQLdkGkccSGNpELUNUnB-wy_h0iA3PRqI&s=VjORkX5s7hrJyl8mW9Q4cfeSWF4qfTdyRjcuAiBt0y4&e= >>>>>>>>>>>> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky tests >>> at >>>>>>>>>> all. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar >>>>>>>>>> <an...@cloudera.com.invalid> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks Maoling! That would be huge help, I appreciate it. >>>>>>>>>>>>> >>>>>>>>>>>>> Andor >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>> >>>> >