Hi, I am new to hadoop and hbase, but have spent the last few weeks learning as much as I can...
I am attempting to create an hbase table during a hadoop job by simply doing puts to a table from each map task. I am hoping that each map task will use the regionserver on its node so that all 10 of my nodes are putting values into the table at the same time. Here is my map class below. The Node class is a simple data structure which knows how to parse a line of input and create a Put for hbase. When I run this I see that only one region server is active for the table I am creating. I know that my input file is split among all 10 of my data nodes, and I know that if I do not do puts to the hbase table everything runs in a parallel on all 10 machines. It is only when I start doing hbase puts that the run times go way up. Thanks, Jon public static class MapClass extends Mapper<Object, Text, IntWritable, Node> { HTableInterface table = null; @Override protected void setup(Context context) throws IOException, InterruptedException { String tableName = context.getConfiguration().get(TABLE); table = new HTable(tableName); } @Override public void map(Object key, Text value, Context context) throws IOException, InterruptedException { Node node = null; try { node = Node.parseNode(value.toString()); } catch (ParseException e) { throw new IOException(); } Put put = node.getPut(); table.put(put); } @Override protected void cleanup(Context context) throws IOException, InterruptedException { table.close(); } }