Re: 4.7.0 RC, Bulk loading performance degradation and failed MR tasks

James Taylor Fri, 26 Feb 2016 08:48:12 -0800

Thanks, Gabriel. I filed PHOENIX-2716. Would you mind investigating,
Sergey? Maybe a good first step would be to revert PHOENIX-2649 and see if
performance goes back to what it was? We can roll a new RC without it and
then get it back in for 4.8.


On Fri, Feb 26, 2016 at 3:01 AM, Gabriel Reid <[email protected]>
wrote:

> I just did a quick test run on this, and it looks to me like something
> is definitely wrong.
>
> I ran a simple ingest test for a table with 5 regions, and it appears
> that only a single HFile is being created. This HFile then needs to be
> recursively split during the step of handing HFiles over to the region
> servers (hence the "xxx no longer fits inside a single region.
> Splitting..." log messages).
>
> This implies that only a single reducer is actually doing any
> processing, which would certainly account for a performance
> degradation. My assumption is that the underlying issue is in the
> partitioner (or the data being passed to the partitioner). I don't
> know if this was introduced as part of PHOENIX-2649 or not.
>
> Sergey, are you (or someone else) able to take a look at this?
> Unfortunately, I don't think there's any way I can get a serious look
> at this any more today.
>
> - Gabriel
>
>
> On Fri, Feb 26, 2016 at 11:21 AM, Sergey Soldatov
> <[email protected]> wrote:
> > I see. We will try to reproduce it. The degradation is possible
> > because 4.6 had a problem described in PHOENIX-2649. In two words -
> > the comparator for rowkeys was working incorrectly and reported that
> > all rowkeys are the same. If the input files are relatively small and
> > reducer has enough memory, all records will be written in one step
> > with the same single rowkey. And that can be the reason why it was
> > faster and there were no splits.
> >
> > Thanks,
> > Sergey
> >
> > On Fri, Feb 26, 2016 at 1:37 AM, 김영우 (YoungWoo Kim) <[email protected]>
> wrote:
> >> Sergey,
> >>
> >> I can't access the cluster right now, so I'll post details and
> >> configurations next week. important facts as far as I remember:
> >> - 8 nodes dev cluster (Hadoop 2.7.1, HBase 1.1.3, Phoenix 4.7.0 RC2 and
> >> Zookeeper 3.4.6)
> >>  * 32 core / 256 GB RAM, Datanode/Nodemanager and RegionServer @ same
> node,
> >> Assigned 24 GB for heap for region server
> >> - # of tables = 9
> >> - Salted with 5, 10 or 20 buckets
> >> - Compressed using Snappy codec
> >> - Data Ingestion : 30 ~ 40 GB / day using bulk loading
> >> - Schema
> >>   The table that I mentioned has 10 columns and 7 columns are varchar
> and
> >> the rest are varchar[].
> >>   I can see performance degradation on bulk load from other tables
> >>
> >> Thanks,
> >>
> >> Youngwoo
> >>
> >>
> >>
> >> On Fri, Feb 26, 2016 at 6:02 PM, Sergey Soldatov <
> [email protected]>
> >> wrote:
> >>
> >>> Hi Youngwoo,
> >>> Could you provide a bit more information about the table structure
> >>> (DDL would be great)? Do you have indexes?
> >>>
> >>> Thanks,
> >>> Sergey
> >>>
> >>> On Tue, Feb 23, 2016 at 10:18 PM, 김영우 (Youngwoo Kim)
> >>> <[email protected]> wrote:
> >>> > Gabriel,
> >>> >
> >>> > I'm using RC2.
> >>> >
> >>> > Youngwoo
> >>> >
> >>> > 2016년 2월 24일 수요일, Gabriel Reid<[email protected]>님이 작성한 메시지:
> >>> >
> >>> >> Hi Youngwoo,
> >>> >>
> >>> >> Which RC are you using for this? RC-1 or RC-2?
> >>> >>
> >>> >> Thanks,
> >>> >>
> >>> >> Gabriel
> >>> >>
> >>> >> On Tue, Feb 23, 2016 at 11:30 AM, 김영우 (YoungWoo Kim) <
> [email protected]
> >>> >> <javascript:;>> wrote:
> >>> >> > Hi,
> >>> >> >
> >>> >> > I'm evaluating 4.7.0 RC on my dev cluster. Looks like it works
> fine
> >>> but I
> >>> >> > run into performance degradation for MR based bulk loading. I've
> been
> >>> >> > loading a million of rows per day into Phoenix table. From 4.7.0
> RC,
> >>> >> there
> >>> >> > are failed jobs with '600 sec' time out in map or reduce stage.
> logs
> >>> as
> >>> >> > follows:
> >>> >> >
> >>> >> > 16/02/22 18:03:45 INFO mapreduce.Job: Task Id :
> >>> >> > attempt_1456035298774_0066_m_000002_0, Status : FAILED
> >>> >> > AttemptID:attempt_1456035298774_0066_m_000002_0 Timed out after
> 600
> >>> secs
> >>> >> >
> >>> >> > 16/02/22 18:05:14 INFO mapreduce.LoadIncrementalHFiles: HFile at
> >>> >> >
> >>> >>
> >>>
> hdfs://fcbig/tmp/74da7ab1-a8ac-4ba8-9d43-0b70f08f8602/HYNIX.BIG_TRACE_SUMMARY/0/_tmp/_tmp/f305427aa8304cf98355bf01c1edb5ce.top
> >>> >> > no longer fits inside a single region. Splitting...
> >>> >> >
> >>> >> > But, the logs have not seen before. so I'm facing about 5 ~ 10x
> >>> >> performance
> >>> >> > degradation for bulk loading. (4.6.0: 10min but 60+ min from
> 4.7.0 RC)
> >>> >> > furthermore, I can't find a clue from MR logs why the tasks filed.
> >>> >> >
> >>> >> > And, I can see the hfile splitting after reduce stage. Is it
> normal?
> >>> >> >
> >>> >> > My envs are:
> >>> >> > - Hadoop 2.7.1
> >>> >> > - HBase 1.1.3
> >>> >> > - Phoenix 4.7.0 RC
> >>> >> >
> >>> >> > Thanks,
> >>> >> >
> >>> >> > Youngwoo
> >>> >>
> >>>
>

Re: 4.7.0 RC, Bulk loading performance degradation and failed MR tasks

Reply via email to