Re: Map Reduce performance

Erik Holstad Wed, 24 Jun 2009 09:06:47 -0700

Hi Ramesh!
Have to agree with Tim about the size of your cluster, I honestly a little
bit surprised that you are actually seeing
that using MR on a single node is faster, since you only get the negative
sides, setup and so on from it, but not
the good stuff.
I looked at the code and it looks good, not really doing to much in the Job,
but I doesn't look like you are doing
anything wrong. I do have some things you can think about thought when you
get a bigger cluster up and running.
1. You might want to stay away from creating Text object, we are internally
trying to move away from all usage of Text in HBase and just use
ImmutableBytesWritable or something like that.
2. Getting a HTable is expensive, so you might want to create a pool of
those connections that you can share so you don't have to get a new one for
every task, not 100% sure about the configure call, but I think it gives you
one per call, might be worth looking into.


Erik

Re: Map Reduce performance

Reply via email to