Difference in Scan class behavior in MapReduce

Doug Meil Fri, 23 Oct 2009 04:31:12 -0700

I apologize if this has been brought up before, but the Scan class acts 
differently in regular client queries than in MapReduce jobs configured by 
TableMapReduceUtil.  I'm using the 0.20.0 release in standalone mode at the 
moment for a proof of concept.


1.  Startrow/Stoprow

    Scan scan = new Scan( startRow, stopRow );

The "startrow", "stoprow" arguments don't seem to be honored in a MapReduce 
jobs and it turns into a full tablescan.

2.  Column selection

If you use this  instance of Scan...

    Scan scan = new Scan( startRow, stopRow );

... in regular client activity this instance will allow selection of attributes 
in the Result.  However, this same instance used in a MapReduce job will 
produce the following exception:
Exception in thread "main" java.io.IOException: Expecting at least one column.
      at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:281)

The remedy is to call either "addColumn" or "addFamily" on the Scan instance as 
appropriate, but it's a little odd that in one use case things will work and in 
another it will exception.



Doug Meil
Director of Engineering
doug.m...@explorys.net

Difference in Scan class behavior in MapReduce

Reply via email to