Hi Anil,

Yes, the second table is distributed, the first is not and I have 3х better
results for nondistrubuted table.

I use distributed hadoop mode for all cases.

Thanks.



On Fri, Mar 30, 2012 at 3:26 AM, anil gupta <anilg...@buffalo.edu> wrote:

> Hi Alexander,
>
> Is data properly distributed over the cluster in Distributed Mode? If the
> data is not then you wont get good results in distributed mode.
>
> Thanks,
> Anil Gupta
>
> On Thu, Mar 29, 2012 at 8:37 AM, Alexander Goryunov <a.goryu...@gmail.com
> >wrote:
>
> > Hello,
> >
> > I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for jobtracker
> > and namenode with Hadoop and HBase and have strange performance results.
> >
> > The same map job runs with speed about 300 000 records per second for 1
> > node table and 100 000 records per second for table  distributed to 3
> > nodes.
> >
> > Scan caching is 1000, each row is about 0.2K, compression is off,
> > setCacheBlock is false.
> >
> > 7 map tasks in parallel for each node. (281 for the big table in summary
> > and 16 for the small table)
> >
> > Map job reads some sequential data and writes down a few from it. No
> reduce
> > tasks are set for this job.
> >
> >
> > Both table have the same data and have sizes about 10M (first one)
> records
> > and 150M (second one) records.
> >
> > Do you have any idea what could be the reason of such behavior?
> >
> > Thanks.
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Reply via email to