Have you tried writing out an hfile and then bulk loading the data?
On Jan 4, 2014 4:01 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:

> bq. Output is written to either Hbase
>
> Looks like Akhtar wants to boost write performance to HBase.
> MapReduce over snapshot files targets higher read throughput.
>
> Cheers
>
>
> On Sat, Jan 4, 2014 at 12:55 PM, Vladimir Rodionov
> <vrodio...@carrieriq.com>wrote:
>
> > You cay try MapReduce over snapshot files
> > https://issues.apache.org/jira/browse/HBASE-8369
> >
> > but you will need to patch 0.94.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodio...@carrieriq.com
> >
> > ________________________________________
> > From: Akhtar Muhammad Din [akhtar.m...@gmail.com]
> > Sent: Saturday, January 04, 2014 12:44 PM
> > To: user@hbase.apache.org
> > Subject: Re: Hbase Performance Issue
> >
> > im  using CDH 4.5:
> > Hadoop:  2.0.0-cdh4.5.0
> > HBase:   0.94.6-cdh4.5.0
> >
> > Regards
> >
> >
> > On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > What version of HBase / hdfs are you running with ?
> > >
> > > Cheers
> > >
> > >
> > >
> > > On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
> > > <akhtar.m...@gmail.com>wrote:
> > >
> > > > Hi,
> > > > I have been running a map reduce job that joins 2 datasets of 1.3
> and 4
> > > GB
> > > > in size. Joining is done at reduce side. Output is written to either
> > > Hbase
> > > > or HDFS depending upon configuration. The problem I am having is that
> > > Hbase
> > > > takes about 60-80 minutes to write the processed data, on the other
> > hand
> > > > HDFS takes only 3-5 mins to write the same data. I really want to
> > improve
> > > > the Hbase speed and bring it down to 1-2 min.
> > > >
> > > > I am using amazon EC2 instances, launched a cluster of size 3 and
> later
> > > 10,
> > > > have tried both c3.4xlarge and c3.8xlarge instances.
> > > >
> > > > I can see significant increase in performance while writing to HDFS
> as
> > i
> > > > use cluster with more nodes, having high specifications, but in the
> > case
> > > of
> > > > Hbase there was no significant change in performance.
> > > >
> > > > I have been going through different posts, articles and have read
> Hbase
> > > > book to solve the Hbase performance issue but have not been able to
> > > succeed
> > > > so far.
> > > > Here are the few things i have tried out so far:
> > > >
> > > > *Client Side*
> > > > - Turned off writing to WAL
> > > > - Experimented with write buffer size
> > > > - Turned off auto flush on table
> > > > - Used cache, experimented with different sizes
> > > >
> > > >
> > > > *Hbase Server Side*
> > > > - Increased region servers heap size to 8 GB
> > > > - Experimented with handlers count
> > > > - Increased Memstore flush size to 512 MB
> > > > - Experimented with hbase.hregion.max.filesize, tried different sizes
> > > >
> > > > There are many other parameters i have tried out following the
> > > suggestions
> > > > from  different sources, but nothing worked so far.
> > > >
> > > > Your help will be really appreciated.
> > > >
> > > > --
> > > > Regards
> > > > Akhtar Muhammad Din
> > > >
> > >
> >
> >
> >
> > --
> > Regards
> > Akhtar Muhammad Din
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or notificati...@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.
> >
>

Reply via email to