Hi Anil, Yes, I'm sure I'm running cluster in distributed mode (I see 21 parallel map tasks in job tracker and processes on each node). max map tasks set to 7 for each node.
I run my job with the same cluster configuration on two tables: 1. Table located only on 1 node (I see it on HBase master page) - 10M records 2. Table even distribute on 3 nodes (also checked on HBase master page) - 150M records. Thanks. On Sat, Mar 31, 2012 at 12:57 AM, anil gupta <anilg...@buffalo.edu> wrote: > Hi Alexander, > > If you can provide more details of the stuff you are doing then it would be > helpful. Are you sure that your cluster is running in distributed mode? Did > you ran the job with 1 node in cluster and then added 2 additional node to > the same cluster? > > Thanks, > Anil > > 2012/3/30 Alexander Goryunov <a.goryu...@gmail.com> > > > Hi Anil, > > > > Yes, the second table is distributed, the first is not and I have 3х > better > > results for nondistrubuted table. > > > > I use distributed hadoop mode for all cases. > > > > Thanks. > > > > > > > > On Fri, Mar 30, 2012 at 3:26 AM, anil gupta <anilg...@buffalo.edu> > wrote: > > > > > Hi Alexander, > > > > > > Is data properly distributed over the cluster in Distributed Mode? If > the > > > data is not then you wont get good results in distributed mode. > > > > > > Thanks, > > > Anil Gupta > > > > > > On Thu, Mar 29, 2012 at 8:37 AM, Alexander Goryunov < > > a.goryu...@gmail.com > > > >wrote: > > > > > > > Hello, > > > > > > > > I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for > > jobtracker > > > > and namenode with Hadoop and HBase and have strange performance > > results. > > > > > > > > The same map job runs with speed about 300 000 records per second > for 1 > > > > node table and 100 000 records per second for table distributed to 3 > > > > nodes. > > > > > > > > Scan caching is 1000, each row is about 0.2K, compression is off, > > > > setCacheBlock is false. > > > > > > > > 7 map tasks in parallel for each node. (281 for the big table in > > summary > > > > and 16 for the small table) > > > > > > > > Map job reads some sequential data and writes down a few from it. No > > > reduce > > > > tasks are set for this job. > > > > > > > > > > > > Both table have the same data and have sizes about 10M (first one) > > > records > > > > and 150M (second one) records. > > > > > > > > Do you have any idea what could be the reason of such behavior? > > > > > > > > Thanks. > > > > > > > > > > > > > > > > -- > > > Thanks & Regards, > > > Anil Gupta > > > > > > > > > -- > Thanks & Regards, > Anil Gupta >