Hey Stephen,
Thanx for initiating this.
Just had a look on HDFS-8538, Seems it had concerns couple of concerns
regarding the write throughput and performance by Arpit Agarwal &
Tsz-wo-Sze. It concluded with a solution in the end as mentioned here :
https://issues.apache.org/jira/browse/HDFS-8538?focusedCommentId=14606094&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14606094

Do you plan to incorporate the same and then continue or is the concern
then raised isn't there now? Any pointers on those concerns and comments?
Would be great if you get a nod from them too..

Thanx
-Ayush

On Thu, 30 Apr 2020 at 09:32, Akira Ajisaka <aajis...@apache.org> wrote:

> +1 to change the default policy in Hadoop 3.4+.
>
> -Akira
>
> On Wed, Apr 29, 2020 at 1:28 AM Clay Baenziger (BLOOMBERG/ 919 3RD A) <
> cbaenzi...@bloomberg.net> wrote:
>
> > I can confirm that my group has run with Available Space for a number of
> > years on the 2.7.x line quite successfully.
> >
> > -Clay
> >
> > From: weic...@cloudera.com.INVALID At: 04/28/20 11:50:27To:
> > sodonn...@cloudera.com.invalid
> > Cc:  hdfs-dev@hadoop.apache.org
> > Subject: Re: Changing the default Datanode Volume Choosing policy
> >
> > +1 to switch it on in Hadoop 3.4.0
> >
> > (1) it doesn't break any existing applications I am aware of.
> > (2) No noticeable performance regression in any cases observed.
> >
> > I feel compelled to make a feature the default if it is strictly better.
> > Hopefully we can make Hadoop easier to use in this way too.
> >
> > On Tue, Apr 28, 2020 at 8:36 AM Stephen O'Donnell
> > <sodonn...@cloudera.com.invalid> wrote:
> >
> > > Hi,
> > >
> > > A long time back there was a Jira raised to change the default volume
> > > choosing policy from Round Robin to Available Space:
> > >
> > > https://issues.apache.org/jira/browse/HDFS-8538
> > >
> > > At the time there were some objections / concerns about using available
> > > space.
> > >
> > > In the 5 years since then, at Cloudera we have seen about 1000 clusters
> > > running with Available Space enabled, and we have not seen any issues
> > > caused by it. It feels like this policy should be the default, as we
> have
> > > to change it more often than not.
> > >
> > > To recap, the Available Space places blocks on disks with more free
> space
> > > with a higher probability until all disks are within a threshold of
> free
> > > space from each other. After that it behaves in a round robin fashion.
> > This
> > > means if a disk is replaced, it will slowly catch up to the usage of
> the
> > > others, and if you have disks of different sizes, they will self
> balance.
> > >
> > > I would like to ask:
> > >
> > > 1. Are there others in the community running the Available Space volume
> > > choosing policy, and if so, have you seen any issues, or does it run
> > > smoothly?
> > >
> > > 2. Does anyone have any strong objections in changing the default to
> > > Available Space from 3.4 onwards?
> > >
> > > Thanks,
> > >
> > > Stephen.
> > >
> >
> >
> >
>

Reply via email to