Re: 4.7.0 RC, Bulk loading performance degradation and failed MR tasks

YoungWoo Kim Fri, 26 Feb 2016 01:38:11 -0800

Sergey,

I can't access the cluster right now, so I'll post details and
configurations next week. important facts as far as I remember:
- 8 nodes dev cluster (Hadoop 2.7.1, HBase 1.1.3, Phoenix 4.7.0 RC2 and
Zookeeper 3.4.6)
 * 32 core / 256 GB RAM, Datanode/Nodemanager and RegionServer @ same node,
Assigned 24 GB for heap for region server
- # of tables = 9
- Salted with 5, 10 or 20 buckets
- Compressed using Snappy codec
- Data Ingestion : 30 ~ 40 GB / day using bulk loading
- Schema
  The table that I mentioned has 10 columns and 7 columns are varchar and
the rest are varchar[].
  I can see performance degradation on bulk load from other tables


Thanks,

Youngwoo



On Fri, Feb 26, 2016 at 6:02 PM, Sergey Soldatov <[email protected]>
wrote:

> Hi Youngwoo,
> Could you provide a bit more information about the table structure
> (DDL would be great)? Do you have indexes?
>
> Thanks,
> Sergey
>
> On Tue, Feb 23, 2016 at 10:18 PM, 김영우 (Youngwoo Kim)
> <[email protected]> wrote:
> > Gabriel,
> >
> > I'm using RC2.
> >
> > Youngwoo
> >
> > 2016년 2월 24일 수요일, Gabriel Reid<[email protected]>님이 작성한 메시지:
> >
> >> Hi Youngwoo,
> >>
> >> Which RC are you using for this? RC-1 or RC-2?
> >>
> >> Thanks,
> >>
> >> Gabriel
> >>
> >> On Tue, Feb 23, 2016 at 11:30 AM, 김영우 (YoungWoo Kim) <[email protected]
> >> <javascript:;>> wrote:
> >> > Hi,
> >> >
> >> > I'm evaluating 4.7.0 RC on my dev cluster. Looks like it works fine
> but I
> >> > run into performance degradation for MR based bulk loading. I've been
> >> > loading a million of rows per day into Phoenix table. From 4.7.0 RC,
> >> there
> >> > are failed jobs with '600 sec' time out in map or reduce stage. logs
> as
> >> > follows:
> >> >
> >> > 16/02/22 18:03:45 INFO mapreduce.Job: Task Id :
> >> > attempt_1456035298774_0066_m_000002_0, Status : FAILED
> >> > AttemptID:attempt_1456035298774_0066_m_000002_0 Timed out after 600
> secs
> >> >
> >> > 16/02/22 18:05:14 INFO mapreduce.LoadIncrementalHFiles: HFile at
> >> >
> >>
> hdfs://fcbig/tmp/74da7ab1-a8ac-4ba8-9d43-0b70f08f8602/HYNIX.BIG_TRACE_SUMMARY/0/_tmp/_tmp/f305427aa8304cf98355bf01c1edb5ce.top
> >> > no longer fits inside a single region. Splitting...
> >> >
> >> > But, the logs have not seen before. so I'm facing about 5 ~ 10x
> >> performance
> >> > degradation for bulk loading. (4.6.0: 10min but 60+ min from 4.7.0 RC)
> >> > furthermore, I can't find a clue from MR logs why the tasks filed.
> >> >
> >> > And, I can see the hfile splitting after reduce stage. Is it normal?
> >> >
> >> > My envs are:
> >> > - Hadoop 2.7.1
> >> > - HBase 1.1.3
> >> > - Phoenix 4.7.0 RC
> >> >
> >> > Thanks,
> >> >
> >> > Youngwoo
> >>
>

Re: 4.7.0 RC, Bulk loading performance degradation and failed MR tasks

Reply via email to