Could you give us a region server log to look at during a job?
On Jan 4, 2014 4:35 PM, "Akhtar Muhammad Din" <akhtar.m...@gmail.com> wrote:

> Thanks guys for your precious time.
> Vladimir, as Ted rightly said i want to improve write performance currently
> (of course i want to read data as fast as possible later on)
> Kevin, my current understanding of bulk load is that you generate
> StoreFiles and later load through a command line program. I dont want to do
> any manual step. Our system is getting data after every 15 minutes, so
> requirement is to automate it through client API completely.
>
>
>
> On Sun, Jan 5, 2014 at 2:19 AM, Kevin O'dell <kevin.od...@cloudera.com
> >wrote:
>
> > Have you tried writing out an hfile and then bulk loading the data?
> > On Jan 4, 2014 4:01 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:
> >
> > > bq. Output is written to either Hbase
> > >
> > > Looks like Akhtar wants to boost write performance to HBase.
> > > MapReduce over snapshot files targets higher read throughput.
> > >
> > > Cheers
> > >
> > >
> > > On Sat, Jan 4, 2014 at 12:55 PM, Vladimir Rodionov
> > > <vrodio...@carrieriq.com>wrote:
> > >
> > > > You cay try MapReduce over snapshot files
> > > > https://issues.apache.org/jira/browse/HBASE-8369
> > > >
> > > > but you will need to patch 0.94.
> > > >
> > > > Best regards,
> > > > Vladimir Rodionov
> > > > Principal Platform Engineer
> > > > Carrier IQ, www.carrieriq.com
> > > > e-mail: vrodio...@carrieriq.com
> > > >
> > > > ________________________________________
> > > > From: Akhtar Muhammad Din [akhtar.m...@gmail.com]
> > > > Sent: Saturday, January 04, 2014 12:44 PM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: Hbase Performance Issue
> > > >
> > > > im  using CDH 4.5:
> > > > Hadoop:  2.0.0-cdh4.5.0
> > > > HBase:   0.94.6-cdh4.5.0
> > > >
> > > > Regards
> > > >
> > > >
> > > > On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> > > >
> > > > > What version of HBase / hdfs are you running with ?
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
> > > > > <akhtar.m...@gmail.com>wrote:
> > > > >
> > > > > > Hi,
> > > > > > I have been running a map reduce job that joins 2 datasets of 1.3
> > > and 4
> > > > > GB
> > > > > > in size. Joining is done at reduce side. Output is written to
> > either
> > > > > Hbase
> > > > > > or HDFS depending upon configuration. The problem I am having is
> > that
> > > > > Hbase
> > > > > > takes about 60-80 minutes to write the processed data, on the
> other
> > > > hand
> > > > > > HDFS takes only 3-5 mins to write the same data. I really want to
> > > > improve
> > > > > > the Hbase speed and bring it down to 1-2 min.
> > > > > >
> > > > > > I am using amazon EC2 instances, launched a cluster of size 3 and
> > > later
> > > > > 10,
> > > > > > have tried both c3.4xlarge and c3.8xlarge instances.
> > > > > >
> > > > > > I can see significant increase in performance while writing to
> HDFS
> > > as
> > > > i
> > > > > > use cluster with more nodes, having high specifications, but in
> the
> > > > case
> > > > > of
> > > > > > Hbase there was no significant change in performance.
> > > > > >
> > > > > > I have been going through different posts, articles and have read
> > > Hbase
> > > > > > book to solve the Hbase performance issue but have not been able
> to
> > > > > succeed
> > > > > > so far.
> > > > > > Here are the few things i have tried out so far:
> > > > > >
> > > > > > *Client Side*
> > > > > > - Turned off writing to WAL
> > > > > > - Experimented with write buffer size
> > > > > > - Turned off auto flush on table
> > > > > > - Used cache, experimented with different sizes
> > > > > >
> > > > > >
> > > > > > *Hbase Server Side*
> > > > > > - Increased region servers heap size to 8 GB
> > > > > > - Experimented with handlers count
> > > > > > - Increased Memstore flush size to 512 MB
> > > > > > - Experimented with hbase.hregion.max.filesize, tried different
> > sizes
> > > > > >
> > > > > > There are many other parameters i have tried out following the
> > > > > suggestions
> > > > > > from  different sources, but nothing worked so far.
> > > > > >
> > > > > > Your help will be really appreciated.
> > > > > >
> > > > > > --
> > > > > > Regards
> > > > > > Akhtar Muhammad Din
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards
> > > > Akhtar Muhammad Din
> > > >
> > > > Confidentiality Notice:  The information contained in this message,
> > > > including any attachments hereto, may be confidential and is intended
> > to
> > > be
> > > > read only by the individual or entity to whom this message is
> > addressed.
> > > If
> > > > the reader of this message is not the intended recipient or an agent
> or
> > > > designee of the intended recipient, please note that any review, use,
> > > > disclosure or distribution of this message or its attachments, in any
> > > form,
> > > > is strictly prohibited.  If you have received this message in error,
> > > please
> > > > immediately notify the sender and/or notificati...@carrieriq.com and
> > > > delete or destroy any copy of this message and its attachments.
> > > >
> > >
> >
>
>
>
> --
> Regards
> Akhtar Muhammad Din
>

Reply via email to