Have you tried writing out an hfile and then bulk loading the data? On Jan 4, 2014 4:01 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:
> bq. Output is written to either Hbase > > Looks like Akhtar wants to boost write performance to HBase. > MapReduce over snapshot files targets higher read throughput. > > Cheers > > > On Sat, Jan 4, 2014 at 12:55 PM, Vladimir Rodionov > <vrodio...@carrieriq.com>wrote: > > > You cay try MapReduce over snapshot files > > https://issues.apache.org/jira/browse/HBASE-8369 > > > > but you will need to patch 0.94. > > > > Best regards, > > Vladimir Rodionov > > Principal Platform Engineer > > Carrier IQ, www.carrieriq.com > > e-mail: vrodio...@carrieriq.com > > > > ________________________________________ > > From: Akhtar Muhammad Din [akhtar.m...@gmail.com] > > Sent: Saturday, January 04, 2014 12:44 PM > > To: user@hbase.apache.org > > Subject: Re: Hbase Performance Issue > > > > im using CDH 4.5: > > Hadoop: 2.0.0-cdh4.5.0 > > HBase: 0.94.6-cdh4.5.0 > > > > Regards > > > > > > On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > What version of HBase / hdfs are you running with ? > > > > > > Cheers > > > > > > > > > > > > On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din > > > <akhtar.m...@gmail.com>wrote: > > > > > > > Hi, > > > > I have been running a map reduce job that joins 2 datasets of 1.3 > and 4 > > > GB > > > > in size. Joining is done at reduce side. Output is written to either > > > Hbase > > > > or HDFS depending upon configuration. The problem I am having is that > > > Hbase > > > > takes about 60-80 minutes to write the processed data, on the other > > hand > > > > HDFS takes only 3-5 mins to write the same data. I really want to > > improve > > > > the Hbase speed and bring it down to 1-2 min. > > > > > > > > I am using amazon EC2 instances, launched a cluster of size 3 and > later > > > 10, > > > > have tried both c3.4xlarge and c3.8xlarge instances. > > > > > > > > I can see significant increase in performance while writing to HDFS > as > > i > > > > use cluster with more nodes, having high specifications, but in the > > case > > > of > > > > Hbase there was no significant change in performance. > > > > > > > > I have been going through different posts, articles and have read > Hbase > > > > book to solve the Hbase performance issue but have not been able to > > > succeed > > > > so far. > > > > Here are the few things i have tried out so far: > > > > > > > > *Client Side* > > > > - Turned off writing to WAL > > > > - Experimented with write buffer size > > > > - Turned off auto flush on table > > > > - Used cache, experimented with different sizes > > > > > > > > > > > > *Hbase Server Side* > > > > - Increased region servers heap size to 8 GB > > > > - Experimented with handlers count > > > > - Increased Memstore flush size to 512 MB > > > > - Experimented with hbase.hregion.max.filesize, tried different sizes > > > > > > > > There are many other parameters i have tried out following the > > > suggestions > > > > from different sources, but nothing worked so far. > > > > > > > > Your help will be really appreciated. > > > > > > > > -- > > > > Regards > > > > Akhtar Muhammad Din > > > > > > > > > > > > > > > -- > > Regards > > Akhtar Muhammad Din > > > > Confidentiality Notice: The information contained in this message, > > including any attachments hereto, may be confidential and is intended to > be > > read only by the individual or entity to whom this message is addressed. > If > > the reader of this message is not the intended recipient or an agent or > > designee of the intended recipient, please note that any review, use, > > disclosure or distribution of this message or its attachments, in any > form, > > is strictly prohibited. If you have received this message in error, > please > > immediately notify the sender and/or notificati...@carrieriq.com and > > delete or destroy any copy of this message and its attachments. > > >