Thank you Arpit Agarwal for notifying me on this user mail.

Yes, heap pressure is introduced due to block-layout migration. Right, high
heap usage is only during the upgrade, and once the upgrade is done then
heap usage back to normal.  Have experienced this issue from many
clusters(20+) but only noticeable in large datanode (where it has millions
of blocks).

IIUC, the high heap usage is introduced from 2.6.0 and 2.6.1 onwards [
HDFS-6482 <https://issues.apache.org/jira/browse/HDFS-6482> (2.6.0) and
HDFS-7443 <https://issues.apache.org/jira/browse/HDFS-7443>(2.6.1)]
2.6.0 =>
https://github.com/apache/hadoop/blame/branch-2.6.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java#L1060
2.6.1 =>
https://github.com/apache/hadoop/blame/branch-2.6.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java#L1067

As long as datanode have memory then no issue.  However, If DN is
configured less heap then CMS struggled a lot because there is
no memory for reclaim.
Multiple users reported datanode runs more than 1 hour without even
completing block migration(for 3.3M blocks with a 6GB heap).  It's just
spent time on GC cycle where overall JVM pause is ~37 minutes (just hangs)

I think we need to revisit these two jiras. Mainly HDFS-6482
<https://issues.apache.org/jira/browse/HDFS-6482>.

Sorry If my understanding is wrong :)

-Karthik

On Wed, Oct 7, 2020 at 8:47 AM Kihwal Lee <kih...@verizonmedia.com.invalid>
wrote:

> We haven't experienced anything like that up to 2.8. We are still in the
> process of stabilizing 2.10 as we upgrade some of the bigger clusters. We
> will know soon how 2.10 datanodes behave under heavy load and storage
> utilization.
>
> If you are seeing a significant change, it might be something post-2.8 or
> even post-2.10.
>
> Kihwal
>
> On Tue, Oct 6, 2020 at 5:09 PM Wei-Chiu Chuang <weic...@cloudera.com>
> wrote:
>
> > Sorry for not being specific.
> > I was referring to HDFS-8791
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D8791&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=dAJ657NT-13Zjdb3zsUQxFoymNFB0SJd_2OTmE5mCR4&m=M36liML4Z0UBfc0vLFzg_C0fN_jTaH_ZbUGM_0Mnwjo&s=ukaowpvXdF0_o7i-UHB4046_L5Qyd0ZkEP9D778DM9c&e=>
> (block
> > ID-based DN storage layout can be very slow for datanode on ext4) where
> it
> > is in 2.8 and above.
> >
> > As I understand it, the increased heap usage only occurs during upgrade.
> > No issue afterwards.
> >
> > My experience was based on CDH5 to CDH6 upgrade (Hadoop 2.6 -> Hadoop
> 3.0)
> > and HDP2 to HDP3 (Hadoop 2.7 -> Hadoop 3.1) upgrade. It is nearly
> > impossible to tell which commit increases heap usage worse during
> upgrade.
> >
> >
> >
> > On Tue, Oct 6, 2020 at 3:01 PM Kihwal Lee <kih...@verizonmedia.com>
> wrote:
> >
> >> Which layout change are you referring to? The only layout change I know
> >> of was done in 2.7, IIRC. We backported that to 2.6 and did not see any
> >> adverse effects at that time.
> >>
> >> Is datanode using more heap all the time? Or is it running into trouble
> >> when generating full block reports?
> >>
> >> Kihwal
> >>
> >> On Mon, Oct 5, 2020 at 1:40 PM Wei-Chiu Chuang
> >> <weic...@cloudera.com.invalid> wrote:
> >>
> >>> We experienced this issue on CDH6 and HDP3, so roughly Hadoop 3.0.x and
> >>> 3.1.x.
> >>> Hermanth experienced the same issue on Hadoop 3.1.1 as well (HDFS-15569
> >>> <
> >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D15569&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=b6gUZYewojO-9YMJdyeI_g&m=itpohwgKPN5qoauYyyMxhGSnasaP3LLbbMVezETEenA&s=kgWYVv2utuAyPWBhv0KVH8ZZGJqQBMvUM7dZ8J0jaa8&e=
> >>> >)
> >>>
> >>> On Mon, Oct 5, 2020 at 11:03 AM Igor Dvorzhak <i...@google.com> wrote:
> >>>
> >>> > What Hadoop 3 version do you use?
> >>> >
> >>> > On Mon, Oct 5, 2020 at 10:03 AM Wei-Chiu Chuang <weic...@apache.org>
> >>> > wrote:
> >>> >
> >>> >> I have anecdotally learned of multiple data points where during the
> >>> >> upgrading from Hadoop 2 to Hadoop 3, DN heap usage increases to the
> >>> point
> >>> >> where it goes OOM.
> >>> >>
> >>> >> Don't have much logs for this issue, but I suspect it's caused by
> the
> >>> >> layout change added in Hadoop 2.8.0.
> >>> >>
> >>> >> Does anyone else observe the same issue and how do you mitigate
> this?
> >>> For
> >>> >> now we suggested increasing DN heap size prior to upgrade as part of
> >>> >> pre-upgrade checklist.
> >>> >>
> >>> >> Thanks,
> >>> >> Wei-Chiu
> >>> >>
> >>> >
> >>>
> >>
>

Reply via email to