Re: 4.7.0 RC, Bulk loading performance degradation and failed MR tasks

Youngwoo Kim Fri, 26 Feb 2016 01:19:48 -0800

Rajeshbabu,

I can't connect the cluster right now so I'll post the logs and details
next week.


Thanks,

Youngwoo

On Fri, Feb 26, 2016 at 3:14 PM, [email protected] <
[email protected]> wrote:

> Hi Youngwoo,
>
> Can you share the full logs of FAILED task attempt_
> 1456035298774_0066_m_000002_0 logs(stderr, syslog, stdout).
>
> 16/02/22 18:03:45 INFO mapreduce.Job: Task Id :
> attempt_1456035298774_0066_m_000002_0, Status : FAILED
> AttemptID:attempt_1456035298774_0066_m_000002_0 Timed out after 600 secs
>
> Thanks,
> Rajeshbabu.
>
> On Fri, Feb 26, 2016 at 10:21 AM, 김영우 (Youngwoo Kim) <[email protected]>
> wrote:
>
> > Hi,
> >
> > I'm looking into logs from railed mr tasks and I found as follows:
> >
> > 2016-02-26 10:34:12,663 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=13,
> > retries=35, started=213888 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164402
> > 2016-02-26 10:34:39,433 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=14,
> > retries=35, started=240658 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164414
> > 2016-02-26 10:35:06,914 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=15,
> > retries=35, started=268139 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164427
> > 2016-02-26 10:35:44,354 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=16,
> > retries=35, started=305579 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164461
> > 2016-02-26 10:36:10,970 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=17,
> > retries=35, started=332195 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164471
> > 2016-02-26 10:36:37,937 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=18,
> > retries=35, started=359162 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164486
> > 2016-02-26 10:36:58,126 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=19,
> > retries=35, started=379351 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164538
> > 2016-02-26 10:37:22,034 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=20,
> > retries=35, started=403259 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164546
> > 2016-02-26 10:37:53,904 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=21,
> > retries=35, started=435129 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164561
> > 2016-02-26 10:38:41,916 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=22,
> > retries=35, started=483141 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164656
> > 2016-02-26 10:39:09,105 INFO [hconnection-0x1efb7582-shared--pool2-t4]
> > org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception,
> tries=23,
> > retries=35, started=510330 ms ago, cancelled=false, msg=row '' on table
> > 'SYSTEM.CATALOG' at
> > region=SYSTEM.CATALOG,,1453257315715.d5e9564b98cf035163a8e4270333e6cf.,
> > hostname=fcbigstg05,16020,1456038654966, seqNum=164664
> >
> >
> > I'm not sure it helps to find a root cause.
> >
> > Thanks,
> >
> > Youngwoo
> >
> > On Thu, Feb 25, 2016 at 2:11 AM, James Taylor <[email protected]>
> > wrote:
> >
> > > Anyone else seeing performance issues for bulk loading? Sergey? Enis?
> > > Rajeshbabu?
> > >
> > > FYI, we plan to roll a new RC today.
> > >
> > > Thanks,
> > > James
> > >
> > > On Tue, Feb 23, 2016 at 10:18 PM, 김영우 (Youngwoo Kim) <
> > [email protected]>
> > > wrote:
> > >
> > > > Gabriel,
> > > >
> > > > I'm using RC2.
> > > >
> > > > Youngwoo
> > > >
> > > > 2016년 2월 24일 수요일, Gabriel Reid<[email protected]>님이 작성한 메시지:
> > > >
> > > > > Hi Youngwoo,
> > > > >
> > > > > Which RC are you using for this? RC-1 or RC-2?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Gabriel
> > > > >
> > > > > On Tue, Feb 23, 2016 at 11:30 AM, 김영우 (YoungWoo Kim) <
> > [email protected]
> > > > > <javascript:;>> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I'm evaluating 4.7.0 RC on my dev cluster. Looks like it works
> fine
> > > > but I
> > > > > > run into performance degradation for MR based bulk loading. I've
> > been
> > > > > > loading a million of rows per day into Phoenix table. From 4.7.0
> > RC,
> > > > > there
> > > > > > are failed jobs with '600 sec' time out in map or reduce stage.
> > logs
> > > as
> > > > > > follows:
> > > > > >
> > > > > > 16/02/22 18:03:45 INFO mapreduce.Job: Task Id :
> > > > > > attempt_1456035298774_0066_m_000002_0, Status : FAILED
> > > > > > AttemptID:attempt_1456035298774_0066_m_000002_0 Timed out after
> 600
> > > > secs
> > > > > >
> > > > > > 16/02/22 18:05:14 INFO mapreduce.LoadIncrementalHFiles: HFile at
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://fcbig/tmp/74da7ab1-a8ac-4ba8-9d43-0b70f08f8602/HYNIX.BIG_TRACE_SUMMARY/0/_tmp/_tmp/f305427aa8304cf98355bf01c1edb5ce.top
> > > > > > no longer fits inside a single region. Splitting...
> > > > > >
> > > > > > But, the logs have not seen before. so I'm facing about 5 ~ 10x
> > > > > performance
> > > > > > degradation for bulk loading. (4.6.0: 10min but 60+ min from
> 4.7.0
> > > RC)
> > > > > > furthermore, I can't find a clue from MR logs why the tasks
> filed.
> > > > > >
> > > > > > And, I can see the hfile splitting after reduce stage. Is it
> > normal?
> > > > > >
> > > > > > My envs are:
> > > > > > - Hadoop 2.7.1
> > > > > > - HBase 1.1.3
> > > > > > - Phoenix 4.7.0 RC
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Youngwoo
> > > > >
> > > >
> > >
> >
>

Re: 4.7.0 RC, Bulk loading performance degradation and failed MR tasks

Reply via email to