Sergey, I can't access the cluster right now, so I'll post details and configurations next week. important facts as far as I remember: - 8 nodes dev cluster (Hadoop 2.7.1, HBase 1.1.3, Phoenix 4.7.0 RC2 and Zookeeper 3.4.6) * 32 core / 256 GB RAM, Datanode/Nodemanager and RegionServer @ same node, Assigned 24 GB for heap for region server - # of tables = 9 - Salted with 5, 10 or 20 buckets - Compressed using Snappy codec - Data Ingestion : 30 ~ 40 GB / day using bulk loading - Schema The table that I mentioned has 10 columns and 7 columns are varchar and the rest are varchar[]. I can see performance degradation on bulk load from other tables
Thanks, Youngwoo On Fri, Feb 26, 2016 at 6:02 PM, Sergey Soldatov <[email protected]> wrote: > Hi Youngwoo, > Could you provide a bit more information about the table structure > (DDL would be great)? Do you have indexes? > > Thanks, > Sergey > > On Tue, Feb 23, 2016 at 10:18 PM, 김영우 (Youngwoo Kim) > <[email protected]> wrote: > > Gabriel, > > > > I'm using RC2. > > > > Youngwoo > > > > 2016년 2월 24일 수요일, Gabriel Reid<[email protected]>님이 작성한 메시지: > > > >> Hi Youngwoo, > >> > >> Which RC are you using for this? RC-1 or RC-2? > >> > >> Thanks, > >> > >> Gabriel > >> > >> On Tue, Feb 23, 2016 at 11:30 AM, 김영우 (YoungWoo Kim) <[email protected] > >> <javascript:;>> wrote: > >> > Hi, > >> > > >> > I'm evaluating 4.7.0 RC on my dev cluster. Looks like it works fine > but I > >> > run into performance degradation for MR based bulk loading. I've been > >> > loading a million of rows per day into Phoenix table. From 4.7.0 RC, > >> there > >> > are failed jobs with '600 sec' time out in map or reduce stage. logs > as > >> > follows: > >> > > >> > 16/02/22 18:03:45 INFO mapreduce.Job: Task Id : > >> > attempt_1456035298774_0066_m_000002_0, Status : FAILED > >> > AttemptID:attempt_1456035298774_0066_m_000002_0 Timed out after 600 > secs > >> > > >> > 16/02/22 18:05:14 INFO mapreduce.LoadIncrementalHFiles: HFile at > >> > > >> > hdfs://fcbig/tmp/74da7ab1-a8ac-4ba8-9d43-0b70f08f8602/HYNIX.BIG_TRACE_SUMMARY/0/_tmp/_tmp/f305427aa8304cf98355bf01c1edb5ce.top > >> > no longer fits inside a single region. Splitting... > >> > > >> > But, the logs have not seen before. so I'm facing about 5 ~ 10x > >> performance > >> > degradation for bulk loading. (4.6.0: 10min but 60+ min from 4.7.0 RC) > >> > furthermore, I can't find a clue from MR logs why the tasks filed. > >> > > >> > And, I can see the hfile splitting after reduce stage. Is it normal? > >> > > >> > My envs are: > >> > - Hadoop 2.7.1 > >> > - HBase 1.1.3 > >> > - Phoenix 4.7.0 RC > >> > > >> > Thanks, > >> > > >> > Youngwoo > >> >
