Hi Tsz-Wo,

Thanks for the detailed explanation of RATIS-2149.

+ 1 for releasing 1.4.1 with RATIS 3.1.1.

Bests,
Sammi

On Sat, 26 Oct 2024 at 00:08, Tsz Wo Sze <szets...@gmail.com> wrote:

> Hi Sammi,
>
> RATIS-2149 <https://issues.apache.org/jira/browse/RATIS-2149> is a minor
> problem which has been there for a very long time.  It said that, when a
> server starts, it may start a leader election even if the server is not
> ready.  The reporter said that they would call addGroup to a new server S
> before calling setConf to add S to the group.  When setConf has failed (not
> sure why) and S has started a leader election, S could get a NOT_IN_CONF
> reply and then shut down.
>
> It seems that they might have called addGroup incorrectly.  If we call
> addGroup with an empty group, the new server will start with the
> initializing state but not the follower state, see [1].  Then, it won't
> start a leader election.
>
> [1]
>
> https://github.com/apache/ratis/blob/3a51121adaf2145e4ec020f4c24858f9f03745d2/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L398
>
> Tsz-Wo
>
>
>
>
> On Fri, Oct 25, 2024 at 2:16 AM Sammi Chen <sammic...@apache.org> wrote:
>
> > Hi Tsz-Wo,
> >
> > What's the impact of https://issues.apache.org/jira/browse/RATIS-2149?
> > Will it cause OM HA leader audo election fail in some circumstances?
> >
> > Thanks,
> > Sammi
> >
> > On Wed, 23 Oct 2024 at 05:43, Tsz Wo Sze <szets...@gmail.com> wrote:
> >
> > > +1 for releasing Ozone 1.4.1 with Ratis 3.1.1.
> > >
> > > Tsz-Wo
> > >
> > > On Tue, Oct 22, 2024 at 1:10 PM Ethan Rose <er...@apache.org> wrote:
> > >
> > > > Hi, any updates on the current 1.4.1 progress? Ratis 3.1.1 should be
> in
> > > > Ozone now that HDDS-11504 <
> > > > https://issues.apache.org/jira/browse/HDDS-11504>
> > > > is resolved. I see there’s discussion of doing a Ratis 3.1.2 to fix
> > > > RATIS-2149 <https://issues.apache.org/jira/browse/RATIS-2149> and
> > > > RATIS-2172
> > > > <https://issues.apache.org/jira/browse/RATIS-2172>, but our 1.4.1
> > > release
> > > > has already been delayed for a while, so I think we should ship with
> > > Ratis
> > > > 3.1.1 and do a 1.4.2 release with just the patch version of Ratis if
> > > > necessary.
> > > >
> > > > I see some new fixes targeting the release like HDDS-11223
> > > > <https://issues.apache.org/jira/browse/HDDS-11223> and HDDS-11136
> > > > <https://issues.apache.org/jira/browse/HDDS-11136>, which is good.
> > What
> > > is
> > > > the overall status update? Are we ready for the next release
> candidate?
> > > >
> > > >
> > > > Ethan
> > > >
> > > > On Wed, Aug 21, 2024 at 12:33 PM Tsz Wo Sze <szets...@gmail.com>
> > wrote:
> > > >
> > > > > > (2) Key put fails for large files (> 20GB) due to a memory leak
> in
> > > > Ratis
> > > > > 3.1.0
> > > > > ...
> > > > >
> > > > > Duong & Wei-chiu,
> > > > >
> > > > > Thanks for finding this problem!
> > > > >
> > > > > Agree that we should have a Ratis 3.1.1 release.
> > > > > BTW, "Memory leak" usually means that memory was allocated but not
> > > > > released; see https://en.wikipedia.org/wiki/Memory_leak . In this
> > > case,
> > > > we
> > > > > are not having such a problem. Our problem is unnecessarily using
> too
> > > > much
> > > > > memory.
> > > > >
> > > > > Tsz-Wo
> > > > >
> > > > >
> > > > > On Tue, Aug 20, 2024 at 6:20 PM Duong Nguyen
> > > <du...@cloudera.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > > > I also filed https://issues.apache.org/jira/browse/RATIS-2141 to
> > > track
> > > > > the
> > > > > > memory leak issue.
> > > > > >
> > > > > > Thanks,
> > > > > > Duong
> > > > > >
> > > > > > On Tue, Aug 20, 2024 at 6:17 PM Duong Nguyen <du...@cloudera.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I just started a thread to discuss releasing Ratis 3.1.1 with
> the
> > > > fixes
> > > > > > of
> > > > > > > the mentioned issues.
> > > > > > >
> > > > > > > Duong
> > > > > > >
> > > > > > > On Tue, Aug 20, 2024 at 5:30 PM Uma Maheswara Rao Gangumalla <
> > > > > > > umaganguma...@gmail.com> wrote:
> > > > > > >
> > > > > > >> Hi Wei-Chiu,
> > > > > > >>
> > > > > > >> Thank you and Duong for the important update on RC1.
> > > > > > >>
> > > > > > >> @Duong would you be notifying this to Ratis community if they
> > can
> > > > > make a
> > > > > > >> quick release with just above 2 fixes?
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Uma
> > > > > > >>
> > > > > > >>
> > > > > > >> On Tue, Aug 20, 2024 at 4:51 PM Wei-Chiu Chuang <
> > > weic...@apache.org
> > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> Hi thanks for the effort,
> > > > > > >>> We are testing the latest Ozone master and Ratis 3.1.0
> > > internally,
> > > > > and
> > > > > > >>> found a few critical issues.
> > > > > > >>>
> > > > > > >>> (1) RATIS-2132 <
> > https://issues.apache.org/jira/browse/RATIS-2132
> > > >
> > > > > > which
> > > > > > >>> has
> > > > > > >>> about 10% performance regression penalty.
> > > > > > >>> (2) Key put fails for large files (> 20GB) due to a memory
> leak
> > > in
> > > > > > Ratis
> > > > > > >>> 3.1.0: it was a haft-done feature of RATIS-1931. DataNode
> could
> > > > crash
> > > > > > due
> > > > > > >>> to out of memory.
> > > > > > >>>
> > > > > > >>> Both of them can only be fixed in Ratis.
> > > > > > >>> I'd suggest to not use Ratis 3.1.0 in Ozone 1.4.1 release.
> > > > > > >>>
> > > > > > >>> If we can, I'd ask the Ratis community to release Ratis 3.1.1
> > > with
> > > > > the
> > > > > > >>> above two fixes.
> > > > > > >>>
> > > > > > >>> cc: @Duong Nguyen <du...@cloudera.com> who helped root cause
> > the
> > > > two
> > > > > > >>> issues.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On Tue, Aug 20, 2024 at 3:31 PM Siyao Meng <si...@apache.org
> >
> > > > wrote:
> > > > > > >>>
> > > > > > >>> >  +1 (binding)
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> >    - Verified signatures
> > > > > > >>> >    - Verified checksums
> > > > > > >>> >    - Checked ./bin/ozone version output from binary tarball
> > > > > > >>> >    - Checked ./bin/ozone checknative output from binary
> > tarball
> > > > > > >>> >       - rocks_tools_native lib check is missing, filed
> > > HDDS-11347
> > > > > > >>> >       <https://issues.apache.org/jira/browse/HDDS-11347>,
> > > > > > >>> non-blocking.
> > > > > > >>> >       - Checked source tarball content matched repo tag
> > > > > > ozone-1.4.1-RC1
> > > > > > >>> >    - Built from source (without native libs support)
> > > > > > >>> >    - Verified compose/ozone Docker dev cluster boots up
> > > correctly
> > > > > > with
> > > > > > >>> 3
> > > > > > >>> >    Ozone datanodes.
> > > > > > >>> >    - Verified basic volume, bucket, key creation and
> deletion
> > > > works
> > > > > > in
> > > > > > >>> >    Docker dev cluster.
> > > > > > >>> >       - Volume recursive deletion prompt is incorrect,
> filed
> > > > > > HDDS-11346
> > > > > > >>> >       <https://issues.apache.org/jira/browse/HDDS-11346>,
> > > > > > >>> non-blocking.
> > > > > > >>> >
> > > > > > >>> >
> > > > > > >>> > -Siyao
> > > > > > >>> >
> > > > > > >>> > On Aug 19, 2024 at 6:39:08 AM, Ayush Saxena <
> > > ayush...@gmail.com>
> > > > > > >>> wrote:
> > > > > > >>> >
> > > > > > >>> > > +1 (Binding), some minor stuff which we should fix in
> next
> > > > > release
> > > > > > >>> > >
> > > > > > >>> > > * Built from source
> > > > > > >>> > > * Verified Checksums
> > > > > > >>> > > * Verified Signatures
> > > > > > >>> > > * All source files have apache header
> > > > > > >>> > > * No code diff b/w the git tag & the contents of src tar
> > > > > > >>> > > (dependency-reduced-pom only in src tar, maybe that ain't
> > > > > required
> > > > > > >>> > > there)
> > > > > > >>> > > * Verified the output of ozone version
> > > > > > >>> > > * Ran some basic shell commands
> > > > > > >>> > > * Checked the NOTICE file: The year is *wrong*, it says
> > 2022,
> > > > it
> > > > > > >>> > > should be 2024 [1], should correct in next release
> > > > > > >>> > > * The NOTICE file inside the packaged Jars is *wrong*, It
> > > > > mentions
> > > > > > >>> > > *Apache Hadoop* & Copyright since 2006, it should be
> Apache
> > > > > Ozone,
> > > > > > >>> > > should fix in the next release.
> > > > > > >>> > > It currently prints:
> > > > > > >>> > > ```
> > > > > > >>> > > Apache Hadoop
> > > > > > >>> > > Copyright 2006 and onwards The Apache Software
> Foundation.
> > > > > > >>> > > .
> > > > > > >>> > > .
> > > > > > >>> > > Hadoop Yarn Server Web Proxy uses the BouncyCastle Java
> > > > > > >>> > > cryptography APIs written by the Legion of the Bouncy
> > Castle
> > > > Inc.
> > > > > > >>> > >
> > > > > > >>> > > ```
> > > > > > >>> > > Can try something like to validate:
> > > > > > >>> > > jar xf share/ozone/lib/ozone-client-1.4.1.jar
> > > > META-INF/NOTICE.txt
> > > > > > >>> > > cat META-INF/NOTICE.txt
> > > > > > >>> > >
> > > > > > >>> > > Thanx Xi Chen for driving the release, Good Luck!!!
> > > > > > >>> > >
> > > > > > >>> > > -Ayush
> > > > > > >>> > >
> > > > > > >>> > > [1]
> > > > > > >>> >
> > > > > >
> > > https://github.com/apache/ozone/blob/ozone-1.4.1-RC1/NOTICE.txt#L1-L2
> > > > > > >>> > >
> > > > > > >>> > > On Mon, 19 Aug 2024 at 11:20, Sammi Chen <
> > > sammic...@apache.org
> > > > >
> > > > > > >>> wrote:
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > > +1 (binding)
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > > * Verified the signature and checksums
> > > > > > >>> > >
> > > > > > >>> > > * Verified tag
> > > > > > >>> > >
> > > > > > >>> > > * Build from source
> > > > > > >>> > >
> > > > > > >>> > > * Run ozonesecure acceptance test
> > > > > > >>> > >
> > > > > > >>> > > * Start a cluster using bin package
> > > > > > >>> > >
> > > > > > >>> > > * Run freon rk command with data verification
> > > > > > >>> > >
> > > > > > >>> > > * Verified information displayed on Recon UI, for both
> > empty
> > > > > > cluster
> > > > > > >>> and
> > > > > > >>> > >
> > > > > > >>> > > cluster with data
> > > > > > >>> > >
> > > > > > >>> > > Sammi
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > > On Fri, 16 Aug 2024 at 13:13, mrchenx <mrch...@126.com>
> > > wrote:
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > > > Dear Ozone Devs,    As discussed in the last email, I
> am
> > > > > calling
> > > > > > >>> for a
> > > > > > >>> > >
> > > > > > >>> > > > vote on Apache Ozone 1.4.1 RC1.
> > > > > > >>> > >
> > > > > > >>> > > >     We have released 1.4.0 on Jan 19th. Now there are
> 177
> > > new
> > > > > > >>> commits
> > > > > > >>> > >
> > > > > > >>> > > > already landed on 1.4.1 branch, Includes Ratis upgrade
> > > > (upgrade
> > > > > > to
> > > > > > >>> > Ratis
> > > > > > >>> > >
> > > > > > >>> > > > 3.1.0), some bug fixes, as well as performance
> > > optimizations,
> > > > > and
> > > > > > >>> some
> > > > > > >>> > >
> > > > > > >>> > > > necessary dependencies.    I am calling for a vote on
> > > Apache
> > > > > > Ozone
> > > > > > >>> > 1.4.1
> > > > > > >>> > >
> > > > > > >>> > > > RC1.   - The RC1 tag can be found on Github at:
> > > > > > >>> > >
> > > > > > >>> > > >         -
> > > > > > >>> https://github.com/apache/ozone/releases/tag/ozone-1.4.1-RC1
> > > > > > >>> > >
> > > > > > >>> > > >    - 177 Jiras were cherry-pick for ozone-1.4.1
> > > > > > >>> > >
> > > > > > >>> > > >         -
> > > > > > >>> > >
> > > > > > >>> > > >
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HDDS%20AND%20fixVersion%20%3D%201.4.1
> > > > > > >>> > >
> > > > > > >>> > > >    - The source and binary tarballs can be found at:
> > > > > > >>> > >
> > > > > > >>> > > >         -
> > > > > > https://dist.apache.org/repos/dist/dev/ozone/1.4.1-rc1/
> > > > > > >>> > >
> > > > > > >>> > > >    - Maven artifacts are staged at:
> > > > > > >>> > >
> > > > > > >>> > > >         -
> > > > > > >>> > >
> > > > > > >>> > > >
> > > > > > >>>
> > > > >
> > https://repository.apache.org/content/repositories/orgapacheozone-1024
> > > > > > >>> > >
> > > > > > >>> > > >    - The public key used to sign the artifacts can be
> > found
> > > > at:
> > > > > > >>> > >
> > > > > > >>> > > >         -
> > > > > https://dist.apache.org/repos/dist/release/ozone/KEYS
> > > > > > >>> > >
> > > > > > >>> > > >    - The fingerprint of the key used to sign the
> > artifacts
> > > > is:
> > > > > > >>> > >
> > > > > > >>> > > >         - 0D8C19F5514E2786007936F758C87003FF9A1A38
> > > > > > >>> > >
> > > > > > >>> > > >    The vote will run for 7 days, ending on Aug 23th
> 2024
> > at
> > > > > 13:10
> > > > > > >>> pm
> > > > > > >>> > > UTC+8.
> > > > > > >>> > >
> > > > > > >>> > > >
> > > > > > >>> > >
> > > > > > >>> > > > Thanks
> > > > > > >>> > >
> > > > > > >>> > > >
> > > > > > >>> > >
> > > > > > >>> > > > Xi Chen
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > >>> > > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org
> > > > > > >>> > > For additional commands, e-mail:
> dev-h...@ozone.apache.org
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to