This is probably more of an u...@hbase.apache.org topic than common-user. To answer your question, you will want to pre-split the table like so: http://hbase.apache.org/book/perf.writing.html
Cheers, Joep Sent from my iPhone On Jun 3, 2012, at 3:45 PM, Jonathan Bishop <jbishop....@gmail.com> wrote: > Thanks Joep, > > My table is empty when I start and will consist of 18M rows when completed > > So I guess I need to understand how to pick row keys such that the regions > will be on that mappers node. Any advice would be appreciated. > > BTW, I do notice that the region servers of other nodes become busy, but > only after a large number of rows have been processed - say 10%. It would > be better if I could deliberately control which regions/regionserver were > going to be used though, to prevent the network traffic of sending rows to > regionservers on other nodes. > > Jon > > On Sun, Jun 3, 2012 at 12:02 PM, Joep Rottinghuis > <jrottingh...@gmail.com>wrote: > >> How large is your table? >> If it is newly created and still almost empty then it will probably >> consist of only one region, which will be hosted on one region server. >> >> Even as the table grows and gets split into multiple regions, you will >> have to split your mappers in such a way that each writes to the key ranges >> corresponding to the regions hosted locally on the corresponding region >> sever. >> >> Cheers, >> >> Joep >> >> Sent from my iPhone >> >> On Jun 2, 2012, at 6:25 PM, Jonathan Bishop <jbishop....@gmail.com> wrote: >> >>> Hi, >>> >>> I am new to hadoop and hbase, but have spent the last few weeks learning >> as >>> much as I can... >>> >>> I am attempting to create an hbase table during a hadoop job by simply >>> doing puts to a table from each map task. I am hoping that each map task >>> will use the regionserver on its node so that all 10 of my nodes are >>> putting values into the table at the same time. >>> >>> Here is my map class below. The Node class is a simple data structure >> which >>> knows how to parse a line of input and create a Put for hbase. >>> >>> When I run this I see that only one region server is active for the >> table I >>> am creating. I know that my input file is split among all 10 of my data >>> nodes, and I know that if I do not do puts to the hbase table everything >>> runs in a parallel on all 10 machines. It is only when I start doing >> hbase >>> puts that the run times go way up. >>> >>> Thanks, >>> >>> Jon >>> >>> public static class MapClass extends Mapper<Object, Text, IntWritable, >>> Node> { >>> HTableInterface table = null; >>> @Override >>> protected void setup(Context context) throws IOException, >>> InterruptedException { >>> String tableName = context.getConfiguration().get(TABLE); >>> table = new HTable(tableName); >>> } >>> @Override >>> public void map(Object key, Text value, Context context) throws >>> IOException, InterruptedException { >>> Node node = null; >>> try { >>> node = Node.parseNode(value.toString()); >>> } catch (ParseException e) { >>> throw new IOException(); >>> } >>> Put put = node.getPut(); >>> table.put(put); >>> } >>> @Override >>> protected void cleanup(Context context) throws IOException, >>> InterruptedException { >>> table.close(); >>> } >>> } >>