Re: hbase puts in map tasks don't seem to run in parallel

Jonathan Bishop Sun, 03 Jun 2012 15:46:23 -0700

Thanks Joep,

My table is empty when I start and will consist of 18M rows when completed


So I guess I need to understand how to pick row keys such that the regions
will be on that mappers node. Any advice would be appreciated.

BTW, I do notice that the region servers of other nodes become busy, but
only after a large number of rows have been processed - say 10%. It would
be better if I could deliberately control which regions/regionserver were
going to be used though, to prevent the network traffic of sending rows to
regionservers on other nodes.

Jon

On Sun, Jun 3, 2012 at 12:02 PM, Joep Rottinghuis <jrottingh...@gmail.com>wrote:

> How large is your table?
> If it is newly created and still almost empty then it will probably
> consist of only one region, which will be hosted on one region server.
>
> Even as the table grows and gets split into multiple regions, you will
> have to split your mappers in such a way that each writes to the key ranges
> corresponding to the regions hosted locally on the corresponding region
> sever.
>
> Cheers,
>
> Joep
>
> Sent from my iPhone
>
> On Jun 2, 2012, at 6:25 PM, Jonathan Bishop <jbishop....@gmail.com> wrote:
>
> > Hi,
> >
> > I am new to hadoop and hbase, but have spent the last few weeks learning
> as
> > much as I can...
> >
> > I am attempting to create an hbase table during a hadoop job by simply
> > doing puts to a table from each map task. I am hoping that each map task
> > will use the regionserver on its node so that all 10 of my nodes are
> > putting values into the table at the same time.
> >
> > Here is my map class below. The Node class is a simple data structure
> which
> > knows how to parse a line of input and create a Put for hbase.
> >
> > When I run this I see that only one region server is active for the
> table I
> > am creating. I know that my input file is split among all 10 of my data
> > nodes, and I know that if I do not do puts to the hbase table everything
> > runs in a parallel on all 10 machines. It is only when I start doing
> hbase
> > puts that the run times go way up.
> >
> > Thanks,
> >
> > Jon
> >
> > public static class MapClass extends Mapper<Object, Text, IntWritable,
> > Node> {
> > HTableInterface table = null;
> > @Override
> > protected void setup(Context context) throws IOException,
> > InterruptedException {
> > String tableName = context.getConfiguration().get(TABLE);
> > table = new HTable(tableName);
> > }
> > @Override
> > public void map(Object key, Text value, Context context) throws
> > IOException, InterruptedException {
> > Node node = null;
> > try {
> > node = Node.parseNode(value.toString());
> > } catch (ParseException e) {
> > throw new IOException();
> > }
> > Put put = node.getPut();
> > table.put(put);
> > }
> > @Override
> > protected void cleanup(Context context) throws IOException,
> > InterruptedException {
> > table.close();
> > }
> > }
>

Re: hbase puts in map tasks don't seem to run in parallel

Reply via email to