Re: 4.7.0 RC, Bulk loading performance degradation and failed MR tasks

Youngwoo Kim Fri, 26 Feb 2016 09:03:44 -0800

Exactly! Gabriel describes the fact that I observed.

Many map and reduce tasks are launched but one or two tasks are running at
the end of the job. it  looks like the work loads are skwed on particular
task.


Thanks,
Youngwoo

2016년 2월 26일 금요일, Gabriel Reid<[email protected]>님이 작성한 메시지:

> I just did a quick test run on this, and it looks to me like something
> is definitely wrong.
>
> I ran a simple ingest test for a table with 5 regions, and it appears
> that only a single HFile is being created. This HFile then needs to be
> recursively split during the step of handing HFiles over to the region
> servers (hence the "xxx no longer fits inside a single region.
> Splitting..." log messages).
>
> This implies that only a single reducer is actually doing any
> processing, which would certainly account for a performance
> degradation. My assumption is that the underlying issue is in the
> partitioner (or the data being passed to the partitioner). I don't
> know if this was introduced as part of PHOENIX-2649 or not.
>
> Sergey, are you (or someone else) able to take a look at this?
> Unfortunately, I don't think there's any way I can get a serious look
> at this any more today.
>
> - Gabriel
>
>
> On Fri, Feb 26, 2016 at 11:21 AM, Sergey Soldatov
> <[email protected] <javascript:;>> wrote:
> > I see. We will try to reproduce it. The degradation is possible
> > because 4.6 had a problem described in PHOENIX-2649. In two words -
> > the comparator for rowkeys was working incorrectly and reported that
> > all rowkeys are the same. If the input files are relatively small and
> > reducer has enough memory, all records will be written in one step
> > with the same single rowkey. And that can be the reason why it was
> > faster and there were no splits.
> >
> > Thanks,
> > Sergey
> >
> > On Fri, Feb 26, 2016 at 1:37 AM, 김영우 (YoungWoo Kim) <[email protected]
> <javascript:;>> wrote:
> >> Sergey,
> >>
> >> I can't access the cluster right now, so I'll post details and
> >> configurations next week. important facts as far as I remember:
> >> - 8 nodes dev cluster (Hadoop 2.7.1, HBase 1.1.3, Phoenix 4.7.0 RC2 and
> >> Zookeeper 3.4.6)
> >>  * 32 core / 256 GB RAM, Datanode/Nodemanager and RegionServer @ same
> node,
> >> Assigned 24 GB for heap for region server
> >> - # of tables = 9
> >> - Salted with 5, 10 or 20 buckets
> >> - Compressed using Snappy codec
> >> - Data Ingestion : 30 ~ 40 GB / day using bulk loading
> >> - Schema
> >>   The table that I mentioned has 10 columns and 7 columns are varchar
> and
> >> the rest are varchar[].
> >>   I can see performance degradation on bulk load from other tables
> >>
> >> Thanks,
> >>
> >> Youngwoo
> >>
> >>
> >>
> >> On Fri, Feb 26, 2016 at 6:02 PM, Sergey Soldatov <
> [email protected] <javascript:;>>
> >> wrote:
> >>
> >>> Hi Youngwoo,
> >>> Could you provide a bit more information about the table structure
> >>> (DDL would be great)? Do you have indexes?
> >>>
> >>> Thanks,
> >>> Sergey
> >>>
> >>> On Tue, Feb 23, 2016 at 10:18 PM, 김영우 (Youngwoo Kim)
> >>> <[email protected] <javascript:;>> wrote:
> >>> > Gabriel,
> >>> >
> >>> > I'm using RC2.
> >>> >
> >>> > Youngwoo
> >>> >
> >>> > 2016년 2월 24일 수요일, Gabriel Reid<[email protected] <javascript:;>>님이
> 작성한 메시지:
> >>> >
> >>> >> Hi Youngwoo,
> >>> >>
> >>> >> Which RC are you using for this? RC-1 or RC-2?
> >>> >>
> >>> >> Thanks,
> >>> >>
> >>> >> Gabriel
> >>> >>
> >>> >> On Tue, Feb 23, 2016 at 11:30 AM, 김영우 (YoungWoo Kim) <
> [email protected] <javascript:;>
> >>> >> <javascript:;>> wrote:
> >>> >> > Hi,
> >>> >> >
> >>> >> > I'm evaluating 4.7.0 RC on my dev cluster. Looks like it works
> fine
> >>> but I
> >>> >> > run into performance degradation for MR based bulk loading. I've
> been
> >>> >> > loading a million of rows per day into Phoenix table. From 4.7.0
> RC,
> >>> >> there
> >>> >> > are failed jobs with '600 sec' time out in map or reduce stage.
> logs
> >>> as
> >>> >> > follows:
> >>> >> >
> >>> >> > 16/02/22 18:03:45 INFO mapreduce.Job: Task Id :
> >>> >> > attempt_1456035298774_0066_m_000002_0, Status : FAILED
> >>> >> > AttemptID:attempt_1456035298774_0066_m_000002_0 Timed out after
> 600
> >>> secs
> >>> >> >
> >>> >> > 16/02/22 18:05:14 INFO mapreduce.LoadIncrementalHFiles: HFile at
> >>> >> >
> >>> >>
> >>>
> hdfs://fcbig/tmp/74da7ab1-a8ac-4ba8-9d43-0b70f08f8602/HYNIX.BIG_TRACE_SUMMARY/0/_tmp/_tmp/f305427aa8304cf98355bf01c1edb5ce.top
> >>> >> > no longer fits inside a single region. Splitting...
> >>> >> >
> >>> >> > But, the logs have not seen before. so I'm facing about 5 ~ 10x
> >>> >> performance
> >>> >> > degradation for bulk loading. (4.6.0: 10min but 60+ min from
> 4.7.0 RC)
> >>> >> > furthermore, I can't find a clue from MR logs why the tasks filed.
> >>> >> >
> >>> >> > And, I can see the hfile splitting after reduce stage. Is it
> normal?
> >>> >> >
> >>> >> > My envs are:
> >>> >> > - Hadoop 2.7.1
> >>> >> > - HBase 1.1.3
> >>> >> > - Phoenix 4.7.0 RC
> >>> >> >
> >>> >> > Thanks,
> >>> >> >
> >>> >> > Youngwoo
> >>> >>
> >>>
>

Re: 4.7.0 RC, Bulk loading performance degradation and failed MR tasks

Reply via email to