Re: performance help

Irfan Mohammed Mon, 06 Jul 2009 12:57:33 -0700

Writing to hdfs directly took just 21 seconds. So I am suspecting that there is 
something that I am doing incorrectly in my hbase setup or my code.


Thanks for the help.

[2009-07-06 15:52:47,917] 09/07/06 15:52:22 INFO mapred.FileInputFormat: Total 
input paths to process : 10
09/07/06 15:52:22 INFO mapred.JobClient: Running job: job_200907052205_0235
09/07/06 15:52:23 INFO mapred.JobClient:  map 0% reduce 0%
09/07/06 15:52:37 INFO mapred.JobClient:  map 7% reduce 0%
09/07/06 15:52:43 INFO mapred.JobClient:  map 100% reduce 0%
09/07/06 15:52:47 INFO mapred.JobClient: Job complete: job_200907052205_0235
09/07/06 15:52:47 INFO mapred.JobClient: Counters: 9
09/07/06 15:52:47 INFO mapred.JobClient:   Job Counters 
09/07/06 15:52:47 INFO mapred.JobClient:     Rack-local map tasks=4
09/07/06 15:52:47 INFO mapred.JobClient:     Launched map tasks=10
09/07/06 15:52:47 INFO mapred.JobClient:     Data-local map tasks=6
09/07/06 15:52:47 INFO mapred.JobClient:   FileSystemCounters
09/07/06 15:52:47 INFO mapred.JobClient:     HDFS_BYTES_READ=57966580
09/07/06 15:52:47 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=587539988
09/07/06 15:52:47 INFO mapred.JobClient:   Map-Reduce Framework
09/07/06 15:52:47 INFO mapred.JobClient:     Map input records=294786
09/07/06 15:52:47 INFO mapred.JobClient:     Spilled Records=0
09/07/06 15:52:47 INFO mapred.JobClient:     Map input bytes=57966580
09/07/06 15:52:47 INFO mapred.JobClient:     Map output records=1160144

----- Original Message -----
From: "stack" <st...@duboce.net>
To: hbase-dev@hadoop.apache.org
Sent: Monday, July 6, 2009 2:36:35 PM GMT -05:00 US/Canada Eastern
Subject: Re: performance help

Sorry, yeah, that'd be 4 tables.  So, yeah, it would seem you only have one
region in each table.  Your cells are small so thats probably about right.

So, an hbase client is contacting 4 different servers to do each update.
And running with one table made no difference to overall time?

St.Ack

On Mon, Jul 6, 2009 at 11:24 AM, Irfan Mohammed <irfan...@gmail.com> wrote:

> Input is 1 file.
>
> These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me,
> it looks like it is always doing 1 region per table and these tables are
> always on different regionservers. I never seen the same table on different
> regionservers. Does that sound right?
>
> ----- Original Message -----
> From: "stack" <st...@duboce.net>
> To: hbase-dev@hadoop.apache.org
> Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern
> Subject: Re: performance help
>
> On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed <irfan...@gmail.com>
> wrote:
>
> > I am working on writing to HDFS files. Will update you by end of day
> today.
> >
> > There are always 10 concurrent mappers running. I keep setting the
> > setNumMaps(5) and also the following properties in mapred-site.xml to 3
> but
> > still end up running 10 concurrent maps.
> >
>
>
> Is your input ten files?
>
>
> >
> > There are 5 regionservers and the online regions are as follows :
> >
> > m1 : -ROOT-,,0
> > m2 : txn_m1,,1245462904101
> > m3 : txn_m4,,1245462942282
> > m4 : txn_m2,,1245462890248
> > m5 : .META.,,1
> >     txn_m3,,1245460727203
> >
>
>
> So, that looks like 4 regions from table txn?
>
> So thats about 1 region per regionserver?
>
>
> > I have setAutoFlush(false) and also writeToWal(false) with the same
> > behaviour.
> >
>
> If you did above and still takes 10 minutes, then that would seem to rule
> out hbase (batching should have big impact on uploads and then setting
> writeToWAL to false, should double throughput over whatever you were seeing
> previous).
>
> St.Ack
>

Re: performance help

Reply via email to