Re: 4.7.0 RC, Bulk loading performance degradation and failed MR tasks

Gabriel Reid Fri, 26 Feb 2016 03:02:13 -0800

I just did a quick test run on this, and it looks to me like something
is definitely wrong.


I ran a simple ingest test for a table with 5 regions, and it appears
that only a single HFile is being created. This HFile then needs to be
recursively split during the step of handing HFiles over to the region
servers (hence the "xxx no longer fits inside a single region.
Splitting..." log messages).

This implies that only a single reducer is actually doing any
processing, which would certainly account for a performance
degradation. My assumption is that the underlying issue is in the
partitioner (or the data being passed to the partitioner). I don't
know if this was introduced as part of PHOENIX-2649 or not.

Sergey, are you (or someone else) able to take a look at this?
Unfortunately, I don't think there's any way I can get a serious look
at this any more today.

- Gabriel


On Fri, Feb 26, 2016 at 11:21 AM, Sergey Soldatov
<[email protected]> wrote:
> I see. We will try to reproduce it. The degradation is possible
> because 4.6 had a problem described in PHOENIX-2649. In two words -
> the comparator for rowkeys was working incorrectly and reported that
> all rowkeys are the same. If the input files are relatively small and
> reducer has enough memory, all records will be written in one step
> with the same single rowkey. And that can be the reason why it was
> faster and there were no splits.
>
> Thanks,
> Sergey
>
> On Fri, Feb 26, 2016 at 1:37 AM, 김영우 (YoungWoo Kim) <[email protected]> wrote:
>> Sergey,
>>
>> I can't access the cluster right now, so I'll post details and
>> configurations next week. important facts as far as I remember:
>> - 8 nodes dev cluster (Hadoop 2.7.1, HBase 1.1.3, Phoenix 4.7.0 RC2 and
>> Zookeeper 3.4.6)
>>  * 32 core / 256 GB RAM, Datanode/Nodemanager and RegionServer @ same node,
>> Assigned 24 GB for heap for region server
>> - # of tables = 9
>> - Salted with 5, 10 or 20 buckets
>> - Compressed using Snappy codec
>> - Data Ingestion : 30 ~ 40 GB / day using bulk loading
>> - Schema
>>   The table that I mentioned has 10 columns and 7 columns are varchar and
>> the rest are varchar[].
>>   I can see performance degradation on bulk load from other tables
>>
>> Thanks,
>>
>> Youngwoo
>>
>>
>>
>> On Fri, Feb 26, 2016 at 6:02 PM, Sergey Soldatov <[email protected]>
>> wrote:
>>
>>> Hi Youngwoo,
>>> Could you provide a bit more information about the table structure
>>> (DDL would be great)? Do you have indexes?
>>>
>>> Thanks,
>>> Sergey
>>>
>>> On Tue, Feb 23, 2016 at 10:18 PM, 김영우 (Youngwoo Kim)
>>> <[email protected]> wrote:
>>> > Gabriel,
>>> >
>>> > I'm using RC2.
>>> >
>>> > Youngwoo
>>> >
>>> > 2016년 2월 24일 수요일, Gabriel Reid<[email protected]>님이 작성한 메시지:
>>> >
>>> >> Hi Youngwoo,
>>> >>
>>> >> Which RC are you using for this? RC-1 or RC-2?
>>> >>
>>> >> Thanks,
>>> >>
>>> >> Gabriel
>>> >>
>>> >> On Tue, Feb 23, 2016 at 11:30 AM, 김영우 (YoungWoo Kim) <[email protected]
>>> >> <javascript:;>> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I'm evaluating 4.7.0 RC on my dev cluster. Looks like it works fine
>>> but I
>>> >> > run into performance degradation for MR based bulk loading. I've been
>>> >> > loading a million of rows per day into Phoenix table. From 4.7.0 RC,
>>> >> there
>>> >> > are failed jobs with '600 sec' time out in map or reduce stage. logs
>>> as
>>> >> > follows:
>>> >> >
>>> >> > 16/02/22 18:03:45 INFO mapreduce.Job: Task Id :
>>> >> > attempt_1456035298774_0066_m_000002_0, Status : FAILED
>>> >> > AttemptID:attempt_1456035298774_0066_m_000002_0 Timed out after 600
>>> secs
>>> >> >
>>> >> > 16/02/22 18:05:14 INFO mapreduce.LoadIncrementalHFiles: HFile at
>>> >> >
>>> >>
>>> hdfs://fcbig/tmp/74da7ab1-a8ac-4ba8-9d43-0b70f08f8602/HYNIX.BIG_TRACE_SUMMARY/0/_tmp/_tmp/f305427aa8304cf98355bf01c1edb5ce.top
>>> >> > no longer fits inside a single region. Splitting...
>>> >> >
>>> >> > But, the logs have not seen before. so I'm facing about 5 ~ 10x
>>> >> performance
>>> >> > degradation for bulk loading. (4.6.0: 10min but 60+ min from 4.7.0 RC)
>>> >> > furthermore, I can't find a clue from MR logs why the tasks filed.
>>> >> >
>>> >> > And, I can see the hfile splitting after reduce stage. Is it normal?
>>> >> >
>>> >> > My envs are:
>>> >> > - Hadoop 2.7.1
>>> >> > - HBase 1.1.3
>>> >> > - Phoenix 4.7.0 RC
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Youngwoo
>>> >>
>>>

Re: 4.7.0 RC, Bulk loading performance degradation and failed MR tasks

Reply via email to