Hi Tsz-Wo, Thanks for the detailed explanation of RATIS-2149.
+ 1 for releasing 1.4.1 with RATIS 3.1.1. Bests, Sammi On Sat, 26 Oct 2024 at 00:08, Tsz Wo Sze <szets...@gmail.com> wrote: > Hi Sammi, > > RATIS-2149 <https://issues.apache.org/jira/browse/RATIS-2149> is a minor > problem which has been there for a very long time. It said that, when a > server starts, it may start a leader election even if the server is not > ready. The reporter said that they would call addGroup to a new server S > before calling setConf to add S to the group. When setConf has failed (not > sure why) and S has started a leader election, S could get a NOT_IN_CONF > reply and then shut down. > > It seems that they might have called addGroup incorrectly. If we call > addGroup with an empty group, the new server will start with the > initializing state but not the follower state, see [1]. Then, it won't > start a leader election. > > [1] > > https://github.com/apache/ratis/blob/3a51121adaf2145e4ec020f4c24858f9f03745d2/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L398 > > Tsz-Wo > > > > > On Fri, Oct 25, 2024 at 2:16 AM Sammi Chen <sammic...@apache.org> wrote: > > > Hi Tsz-Wo, > > > > What's the impact of https://issues.apache.org/jira/browse/RATIS-2149? > > Will it cause OM HA leader audo election fail in some circumstances? > > > > Thanks, > > Sammi > > > > On Wed, 23 Oct 2024 at 05:43, Tsz Wo Sze <szets...@gmail.com> wrote: > > > > > +1 for releasing Ozone 1.4.1 with Ratis 3.1.1. > > > > > > Tsz-Wo > > > > > > On Tue, Oct 22, 2024 at 1:10 PM Ethan Rose <er...@apache.org> wrote: > > > > > > > Hi, any updates on the current 1.4.1 progress? Ratis 3.1.1 should be > in > > > > Ozone now that HDDS-11504 < > > > > https://issues.apache.org/jira/browse/HDDS-11504> > > > > is resolved. I see there’s discussion of doing a Ratis 3.1.2 to fix > > > > RATIS-2149 <https://issues.apache.org/jira/browse/RATIS-2149> and > > > > RATIS-2172 > > > > <https://issues.apache.org/jira/browse/RATIS-2172>, but our 1.4.1 > > > release > > > > has already been delayed for a while, so I think we should ship with > > > Ratis > > > > 3.1.1 and do a 1.4.2 release with just the patch version of Ratis if > > > > necessary. > > > > > > > > I see some new fixes targeting the release like HDDS-11223 > > > > <https://issues.apache.org/jira/browse/HDDS-11223> and HDDS-11136 > > > > <https://issues.apache.org/jira/browse/HDDS-11136>, which is good. > > What > > > is > > > > the overall status update? Are we ready for the next release > candidate? > > > > > > > > > > > > Ethan > > > > > > > > On Wed, Aug 21, 2024 at 12:33 PM Tsz Wo Sze <szets...@gmail.com> > > wrote: > > > > > > > > > > (2) Key put fails for large files (> 20GB) due to a memory leak > in > > > > Ratis > > > > > 3.1.0 > > > > > ... > > > > > > > > > > Duong & Wei-chiu, > > > > > > > > > > Thanks for finding this problem! > > > > > > > > > > Agree that we should have a Ratis 3.1.1 release. > > > > > BTW, "Memory leak" usually means that memory was allocated but not > > > > > released; see https://en.wikipedia.org/wiki/Memory_leak . In this > > > case, > > > > we > > > > > are not having such a problem. Our problem is unnecessarily using > too > > > > much > > > > > memory. > > > > > > > > > > Tsz-Wo > > > > > > > > > > > > > > > On Tue, Aug 20, 2024 at 6:20 PM Duong Nguyen > > > <du...@cloudera.com.invalid > > > > > > > > > > wrote: > > > > > > > > > > > I also filed https://issues.apache.org/jira/browse/RATIS-2141 to > > > track > > > > > the > > > > > > memory leak issue. > > > > > > > > > > > > Thanks, > > > > > > Duong > > > > > > > > > > > > On Tue, Aug 20, 2024 at 6:17 PM Duong Nguyen <du...@cloudera.com > > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I just started a thread to discuss releasing Ratis 3.1.1 with > the > > > > fixes > > > > > > of > > > > > > > the mentioned issues. > > > > > > > > > > > > > > Duong > > > > > > > > > > > > > > On Tue, Aug 20, 2024 at 5:30 PM Uma Maheswara Rao Gangumalla < > > > > > > > umaganguma...@gmail.com> wrote: > > > > > > > > > > > > > >> Hi Wei-Chiu, > > > > > > >> > > > > > > >> Thank you and Duong for the important update on RC1. > > > > > > >> > > > > > > >> @Duong would you be notifying this to Ratis community if they > > can > > > > > make a > > > > > > >> quick release with just above 2 fixes? > > > > > > >> > > > > > > >> Regards, > > > > > > >> Uma > > > > > > >> > > > > > > >> > > > > > > >> On Tue, Aug 20, 2024 at 4:51 PM Wei-Chiu Chuang < > > > weic...@apache.org > > > > > > > > > > > >> wrote: > > > > > > >> > > > > > > >>> Hi thanks for the effort, > > > > > > >>> We are testing the latest Ozone master and Ratis 3.1.0 > > > internally, > > > > > and > > > > > > >>> found a few critical issues. > > > > > > >>> > > > > > > >>> (1) RATIS-2132 < > > https://issues.apache.org/jira/browse/RATIS-2132 > > > > > > > > > > which > > > > > > >>> has > > > > > > >>> about 10% performance regression penalty. > > > > > > >>> (2) Key put fails for large files (> 20GB) due to a memory > leak > > > in > > > > > > Ratis > > > > > > >>> 3.1.0: it was a haft-done feature of RATIS-1931. DataNode > could > > > > crash > > > > > > due > > > > > > >>> to out of memory. > > > > > > >>> > > > > > > >>> Both of them can only be fixed in Ratis. > > > > > > >>> I'd suggest to not use Ratis 3.1.0 in Ozone 1.4.1 release. > > > > > > >>> > > > > > > >>> If we can, I'd ask the Ratis community to release Ratis 3.1.1 > > > with > > > > > the > > > > > > >>> above two fixes. > > > > > > >>> > > > > > > >>> cc: @Duong Nguyen <du...@cloudera.com> who helped root cause > > the > > > > two > > > > > > >>> issues. > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> On Tue, Aug 20, 2024 at 3:31 PM Siyao Meng <si...@apache.org > > > > > > wrote: > > > > > > >>> > > > > > > >>> > +1 (binding) > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > - Verified signatures > > > > > > >>> > - Verified checksums > > > > > > >>> > - Checked ./bin/ozone version output from binary tarball > > > > > > >>> > - Checked ./bin/ozone checknative output from binary > > tarball > > > > > > >>> > - rocks_tools_native lib check is missing, filed > > > HDDS-11347 > > > > > > >>> > <https://issues.apache.org/jira/browse/HDDS-11347>, > > > > > > >>> non-blocking. > > > > > > >>> > - Checked source tarball content matched repo tag > > > > > > ozone-1.4.1-RC1 > > > > > > >>> > - Built from source (without native libs support) > > > > > > >>> > - Verified compose/ozone Docker dev cluster boots up > > > correctly > > > > > > with > > > > > > >>> 3 > > > > > > >>> > Ozone datanodes. > > > > > > >>> > - Verified basic volume, bucket, key creation and > deletion > > > > works > > > > > > in > > > > > > >>> > Docker dev cluster. > > > > > > >>> > - Volume recursive deletion prompt is incorrect, > filed > > > > > > HDDS-11346 > > > > > > >>> > <https://issues.apache.org/jira/browse/HDDS-11346>, > > > > > > >>> non-blocking. > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > -Siyao > > > > > > >>> > > > > > > > >>> > On Aug 19, 2024 at 6:39:08 AM, Ayush Saxena < > > > ayush...@gmail.com> > > > > > > >>> wrote: > > > > > > >>> > > > > > > > >>> > > +1 (Binding), some minor stuff which we should fix in > next > > > > > release > > > > > > >>> > > > > > > > > >>> > > * Built from source > > > > > > >>> > > * Verified Checksums > > > > > > >>> > > * Verified Signatures > > > > > > >>> > > * All source files have apache header > > > > > > >>> > > * No code diff b/w the git tag & the contents of src tar > > > > > > >>> > > (dependency-reduced-pom only in src tar, maybe that ain't > > > > > required > > > > > > >>> > > there) > > > > > > >>> > > * Verified the output of ozone version > > > > > > >>> > > * Ran some basic shell commands > > > > > > >>> > > * Checked the NOTICE file: The year is *wrong*, it says > > 2022, > > > > it > > > > > > >>> > > should be 2024 [1], should correct in next release > > > > > > >>> > > * The NOTICE file inside the packaged Jars is *wrong*, It > > > > > mentions > > > > > > >>> > > *Apache Hadoop* & Copyright since 2006, it should be > Apache > > > > > Ozone, > > > > > > >>> > > should fix in the next release. > > > > > > >>> > > It currently prints: > > > > > > >>> > > ``` > > > > > > >>> > > Apache Hadoop > > > > > > >>> > > Copyright 2006 and onwards The Apache Software > Foundation. > > > > > > >>> > > . > > > > > > >>> > > . > > > > > > >>> > > Hadoop Yarn Server Web Proxy uses the BouncyCastle Java > > > > > > >>> > > cryptography APIs written by the Legion of the Bouncy > > Castle > > > > Inc. > > > > > > >>> > > > > > > > > >>> > > ``` > > > > > > >>> > > Can try something like to validate: > > > > > > >>> > > jar xf share/ozone/lib/ozone-client-1.4.1.jar > > > > META-INF/NOTICE.txt > > > > > > >>> > > cat META-INF/NOTICE.txt > > > > > > >>> > > > > > > > > >>> > > Thanx Xi Chen for driving the release, Good Luck!!! > > > > > > >>> > > > > > > > > >>> > > -Ayush > > > > > > >>> > > > > > > > > >>> > > [1] > > > > > > >>> > > > > > > > > > > https://github.com/apache/ozone/blob/ozone-1.4.1-RC1/NOTICE.txt#L1-L2 > > > > > > >>> > > > > > > > > >>> > > On Mon, 19 Aug 2024 at 11:20, Sammi Chen < > > > sammic...@apache.org > > > > > > > > > > > >>> wrote: > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > +1 (binding) > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > * Verified the signature and checksums > > > > > > >>> > > > > > > > > >>> > > * Verified tag > > > > > > >>> > > > > > > > > >>> > > * Build from source > > > > > > >>> > > > > > > > > >>> > > * Run ozonesecure acceptance test > > > > > > >>> > > > > > > > > >>> > > * Start a cluster using bin package > > > > > > >>> > > > > > > > > >>> > > * Run freon rk command with data verification > > > > > > >>> > > > > > > > > >>> > > * Verified information displayed on Recon UI, for both > > empty > > > > > > cluster > > > > > > >>> and > > > > > > >>> > > > > > > > > >>> > > cluster with data > > > > > > >>> > > > > > > > > >>> > > Sammi > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > On Fri, 16 Aug 2024 at 13:13, mrchenx <mrch...@126.com> > > > wrote: > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > Dear Ozone Devs, As discussed in the last email, I > am > > > > > calling > > > > > > >>> for a > > > > > > >>> > > > > > > > > >>> > > > vote on Apache Ozone 1.4.1 RC1. > > > > > > >>> > > > > > > > > >>> > > > We have released 1.4.0 on Jan 19th. Now there are > 177 > > > new > > > > > > >>> commits > > > > > > >>> > > > > > > > > >>> > > > already landed on 1.4.1 branch, Includes Ratis upgrade > > > > (upgrade > > > > > > to > > > > > > >>> > Ratis > > > > > > >>> > > > > > > > > >>> > > > 3.1.0), some bug fixes, as well as performance > > > optimizations, > > > > > and > > > > > > >>> some > > > > > > >>> > > > > > > > > >>> > > > necessary dependencies. I am calling for a vote on > > > Apache > > > > > > Ozone > > > > > > >>> > 1.4.1 > > > > > > >>> > > > > > > > > >>> > > > RC1. - The RC1 tag can be found on Github at: > > > > > > >>> > > > > > > > > >>> > > > - > > > > > > >>> https://github.com/apache/ozone/releases/tag/ozone-1.4.1-RC1 > > > > > > >>> > > > > > > > > >>> > > > - 177 Jiras were cherry-pick for ozone-1.4.1 > > > > > > >>> > > > > > > > > >>> > > > - > > > > > > >>> > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20HDDS%20AND%20fixVersion%20%3D%201.4.1 > > > > > > >>> > > > > > > > > >>> > > > - The source and binary tarballs can be found at: > > > > > > >>> > > > > > > > > >>> > > > - > > > > > > https://dist.apache.org/repos/dist/dev/ozone/1.4.1-rc1/ > > > > > > >>> > > > > > > > > >>> > > > - Maven artifacts are staged at: > > > > > > >>> > > > > > > > > >>> > > > - > > > > > > >>> > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > https://repository.apache.org/content/repositories/orgapacheozone-1024 > > > > > > >>> > > > > > > > > >>> > > > - The public key used to sign the artifacts can be > > found > > > > at: > > > > > > >>> > > > > > > > > >>> > > > - > > > > > https://dist.apache.org/repos/dist/release/ozone/KEYS > > > > > > >>> > > > > > > > > >>> > > > - The fingerprint of the key used to sign the > > artifacts > > > > is: > > > > > > >>> > > > > > > > > >>> > > > - 0D8C19F5514E2786007936F758C87003FF9A1A38 > > > > > > >>> > > > > > > > > >>> > > > The vote will run for 7 days, ending on Aug 23th > 2024 > > at > > > > > 13:10 > > > > > > >>> pm > > > > > > >>> > > UTC+8. > > > > > > >>> > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > Thanks > > > > > > >>> > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > Xi Chen > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > > > --------------------------------------------------------------------- > > > > > > >>> > > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > > > > > > >>> > > For additional commands, e-mail: > dev-h...@ozone.apache.org > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >