Thanks Joep, My table is empty when I start and will consist of 18M rows when completed
So I guess I need to understand how to pick row keys such that the regions will be on that mappers node. Any advice would be appreciated. BTW, I do notice that the region servers of other nodes become busy, but only after a large number of rows have been processed - say 10%. It would be better if I could deliberately control which regions/regionserver were going to be used though, to prevent the network traffic of sending rows to regionservers on other nodes. Jon On Sun, Jun 3, 2012 at 12:02 PM, Joep Rottinghuis <jrottingh...@gmail.com>wrote: > How large is your table? > If it is newly created and still almost empty then it will probably > consist of only one region, which will be hosted on one region server. > > Even as the table grows and gets split into multiple regions, you will > have to split your mappers in such a way that each writes to the key ranges > corresponding to the regions hosted locally on the corresponding region > sever. > > Cheers, > > Joep > > Sent from my iPhone > > On Jun 2, 2012, at 6:25 PM, Jonathan Bishop <jbishop....@gmail.com> wrote: > > > Hi, > > > > I am new to hadoop and hbase, but have spent the last few weeks learning > as > > much as I can... > > > > I am attempting to create an hbase table during a hadoop job by simply > > doing puts to a table from each map task. I am hoping that each map task > > will use the regionserver on its node so that all 10 of my nodes are > > putting values into the table at the same time. > > > > Here is my map class below. The Node class is a simple data structure > which > > knows how to parse a line of input and create a Put for hbase. > > > > When I run this I see that only one region server is active for the > table I > > am creating. I know that my input file is split among all 10 of my data > > nodes, and I know that if I do not do puts to the hbase table everything > > runs in a parallel on all 10 machines. It is only when I start doing > hbase > > puts that the run times go way up. > > > > Thanks, > > > > Jon > > > > public static class MapClass extends Mapper<Object, Text, IntWritable, > > Node> { > > HTableInterface table = null; > > @Override > > protected void setup(Context context) throws IOException, > > InterruptedException { > > String tableName = context.getConfiguration().get(TABLE); > > table = new HTable(tableName); > > } > > @Override > > public void map(Object key, Text value, Context context) throws > > IOException, InterruptedException { > > Node node = null; > > try { > > node = Node.parseNode(value.toString()); > > } catch (ParseException e) { > > throw new IOException(); > > } > > Put put = node.getPut(); > > table.put(put); > > } > > @Override > > protected void cleanup(Context context) throws IOException, > > InterruptedException { > > table.close(); > > } > > } >