Hi Anil, Yes, the second table is distributed, the first is not and I have 3х better results for nondistrubuted table.
I use distributed hadoop mode for all cases. Thanks. On Fri, Mar 30, 2012 at 3:26 AM, anil gupta <anilg...@buffalo.edu> wrote: > Hi Alexander, > > Is data properly distributed over the cluster in Distributed Mode? If the > data is not then you wont get good results in distributed mode. > > Thanks, > Anil Gupta > > On Thu, Mar 29, 2012 at 8:37 AM, Alexander Goryunov <a.goryu...@gmail.com > >wrote: > > > Hello, > > > > I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for jobtracker > > and namenode with Hadoop and HBase and have strange performance results. > > > > The same map job runs with speed about 300 000 records per second for 1 > > node table and 100 000 records per second for table distributed to 3 > > nodes. > > > > Scan caching is 1000, each row is about 0.2K, compression is off, > > setCacheBlock is false. > > > > 7 map tasks in parallel for each node. (281 for the big table in summary > > and 16 for the small table) > > > > Map job reads some sequential data and writes down a few from it. No > reduce > > tasks are set for this job. > > > > > > Both table have the same data and have sizes about 10M (first one) > records > > and 150M (second one) records. > > > > Do you have any idea what could be the reason of such behavior? > > > > Thanks. > > > > > > -- > Thanks & Regards, > Anil Gupta >