Writing to hdfs directly took just 21 seconds. So I am suspecting that there is something that I am doing incorrectly in my hbase setup or my code.
Thanks for the help. [2009-07-06 15:52:47,917] 09/07/06 15:52:22 INFO mapred.FileInputFormat: Total input paths to process : 10 09/07/06 15:52:22 INFO mapred.JobClient: Running job: job_200907052205_0235 09/07/06 15:52:23 INFO mapred.JobClient: map 0% reduce 0% 09/07/06 15:52:37 INFO mapred.JobClient: map 7% reduce 0% 09/07/06 15:52:43 INFO mapred.JobClient: map 100% reduce 0% 09/07/06 15:52:47 INFO mapred.JobClient: Job complete: job_200907052205_0235 09/07/06 15:52:47 INFO mapred.JobClient: Counters: 9 09/07/06 15:52:47 INFO mapred.JobClient: Job Counters 09/07/06 15:52:47 INFO mapred.JobClient: Rack-local map tasks=4 09/07/06 15:52:47 INFO mapred.JobClient: Launched map tasks=10 09/07/06 15:52:47 INFO mapred.JobClient: Data-local map tasks=6 09/07/06 15:52:47 INFO mapred.JobClient: FileSystemCounters 09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_READ=57966580 09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=587539988 09/07/06 15:52:47 INFO mapred.JobClient: Map-Reduce Framework 09/07/06 15:52:47 INFO mapred.JobClient: Map input records=294786 09/07/06 15:52:47 INFO mapred.JobClient: Spilled Records=0 09/07/06 15:52:47 INFO mapred.JobClient: Map input bytes=57966580 09/07/06 15:52:47 INFO mapred.JobClient: Map output records=1160144 ----- Original Message ----- From: "stack" <st...@duboce.net> To: hbase-dev@hadoop.apache.org Sent: Monday, July 6, 2009 2:36:35 PM GMT -05:00 US/Canada Eastern Subject: Re: performance help Sorry, yeah, that'd be 4 tables. So, yeah, it would seem you only have one region in each table. Your cells are small so thats probably about right. So, an hbase client is contacting 4 different servers to do each update. And running with one table made no difference to overall time? St.Ack On Mon, Jul 6, 2009 at 11:24 AM, Irfan Mohammed <irfan...@gmail.com> wrote: > Input is 1 file. > > These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me, > it looks like it is always doing 1 region per table and these tables are > always on different regionservers. I never seen the same table on different > regionservers. Does that sound right? > > ----- Original Message ----- > From: "stack" <st...@duboce.net> > To: hbase-dev@hadoop.apache.org > Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern > Subject: Re: performance help > > On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed <irfan...@gmail.com> > wrote: > > > I am working on writing to HDFS files. Will update you by end of day > today. > > > > There are always 10 concurrent mappers running. I keep setting the > > setNumMaps(5) and also the following properties in mapred-site.xml to 3 > but > > still end up running 10 concurrent maps. > > > > > Is your input ten files? > > > > > > There are 5 regionservers and the online regions are as follows : > > > > m1 : -ROOT-,,0 > > m2 : txn_m1,,1245462904101 > > m3 : txn_m4,,1245462942282 > > m4 : txn_m2,,1245462890248 > > m5 : .META.,,1 > > txn_m3,,1245460727203 > > > > > So, that looks like 4 regions from table txn? > > So thats about 1 region per regionserver? > > > > I have setAutoFlush(false) and also writeToWal(false) with the same > > behaviour. > > > > If you did above and still takes 10 minutes, then that would seem to rule > out hbase (batching should have big impact on uploads and then setting > writeToWAL to false, should double throughput over whatever you were seeing > previous). > > St.Ack >