Distributed table processing is slower that local table processing

Alexander Goryunov Thu, 29 Mar 2012 08:38:22 -0700

Hello,

I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for jobtracker
and namenode with Hadoop and HBase and have strange performance results.


The same map job runs with speed about 300 000 records per second for 1
node table and 100 000 records per second for table  distributed to 3 nodes.

Scan caching is 1000, each row is about 0.2K, compression is off,
setCacheBlock is false.

7 map tasks in parallel for each node. (281 for the big table in summary
and 16 for the small table)

Map job reads some sequential data and writes down a few from it. No reduce
tasks are set for this job.


Both table have the same data and have sizes about 10M (first one) records
and 150M (second one) records.

Do you have any idea what could be the reason of such behavior?

Thanks.

Distributed table processing is slower that local table processing

Reply via email to