Thanks Sumit for the review and comments.

Sorry for a late update. With the aggregated feedback and a discussion with
Symious, we decided to remove SCM as the proxy of disk balancer command to
datanode, so CLI will send directly to datanode. Here are the JIRAs of this
major change, HDDS-13598 <https://issues.apache.org/jira/browse/HDDS-13598>
 , HDDS-13878 <https://issues.apache.org/jira/browse/HDDS-13878>.  And
introduced a feature flag HDDS-13497
<https://issues.apache.org/jira/browse/HDDS-13497>.

With the above changes, all major comments are addressed.  Please help to
take another review and cast your vote for disk balancer branch merge.


Thanks,
Sammi

On Fri, 5 Sept 2025 at 20:08, Sumit Agrawal
<[email protected]> wrote:

> Hi All,
>
> Few feedback from my side over design:
> 1. SCM act as proxy for CLI command like start / stop / config changes but
> it has issue due to SCM async nature (API result consistency),
>   a. Command results for successful apply or failure of change can not be
> shown in *expected* *time* (for a happy path will take at least 2 HB time,
> i.e. 1 min)
>   b. Command rejection failure is high, based on SCM state (especially for
> all DN cases) like DN state un-healthy or not available or restarted, SCM
> restart, ...
> IMO, we can remove this CLI command execution via SCM but can be direct
> from Tool CLI itself as its redundant job which CLI also can do
> efficiently.
>
> 2. The DN side having a file as persistence of configuration is complex and
> error prone. DB can be used instead to keep configuration as there for open
> containers.
>   a. file format versioning code for upgrade -- having additional code
>   b. file consistency, recovery on failure and atomic write -- additional
> code is there
>   c. file path configuration
>   d. amount of code is high 2-3kloc for this which can be handled in a few
> 100s.
>   So I think we should replace the file with DB for keeping configuration
> in an atomic way which DB has inbuilt features and *proto* provides
> implicit *upgrade capability.*
>
> IMO, we need to make the above 2 changes with respect to code
> implementation. Others can share opinions.
>
> Regards
> Sumit
>
> On Fri, Aug 22, 2025 at 9:33 PM Tsz-Wo Nicholas Sze <[email protected]>
> wrote:
>
> > > The SlidingWindow class is not related to disk balancer, ...
> >
> > Ethan, thanks for pointing it out!  Then, we should not remove it here.
> >
> > Tsz-Wo
> >
> >
> > On Thu, Aug 21, 2025 at 1:18 PM Ethan Rose <[email protected]> wrote:
> >
> > > Thanks for working to update this for merge.
> > >
> > > The SlidingWindow class is not related to disk balancer, it has been
> > added
> > > to master to be used by the volume scanner. It was added in HDDS-12852
> > > <https://issues.apache.org/jira/browse/HDDS-12852> and is being used
> in
> > > HDDS-13108 <https://issues.apache.org/jira/browse/HDDS-13108> /
> > > https://github.com/apache/ozone/pull/8843. Please don't remove it.
> > >
> > > Please update the wiki with the justification for why there are no
> > > incompatible changes as described in this thread. Currently it still
> > > says "There
> > > should not be any incompatible changes introduced with this feature."
> but
> > > provides no justification.
> > >
> > > Regarding the feature flag, the status communicated in this thread must
> > > also be clear to end users. Until this is considered stable and enabled
> > by
> > > default, please:
> > > - Update the user doc added in
> https://github.com/apache/ozone/pull/8837
> > > to
> > > indicate that the feature is considered experimental/beta.
> Additionally,
> > I
> > > don't see a mention of the feature flag in this doc.
> > > - Update the config description added in
> > > https://github.com/apache/ozone/pull/8869 to indicate the same.
> > > - Update the CLI help message to indicate the same.
> > > - Hide the CLI so it does not appear in help messages.
> > >
> > > Additionally, please make all the CLI flags kebab-case. They are using
> a
> > > mix of kebab and camel case right now. Ideally the command itself would
> > be
> > > `disk-balancer` as well, but since we already have `containerbalancer`
> we
> > > might want to leave that part as is for consistency.
> > >
> > > Thanks,
> > > Ethan
> > >
> >
>
>
> --
> *Sumit Agrawal* | Senior Staff Engineer
> cloudera.com <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>

Reply via email to