Hi there- You probably want to review this section of the RegGuide: http://hbase.apache.org/book.html#mapreduce
re: "it's inefficient to have one scan object to scan everything." It is. But in the MapReduce case, there is a Map-task for each input split (see the RefGuide for details), and therefore a Scanner instance per Map-task. On 1/18/13 5:43 PM, "Xu, Leon" <guodo...@amazon.com> wrote: >Hi HBase users, > >I am currently trying to set up a denormalization map-reduce job for my >HBase Table. >Since our table contains large volume of data, it's inefficient to have >one scan object to scan everything. We are only need to process those >records that have changes. I am planning to have multiple scan objects, >each of which scan object specifies range given that we are in track of >what rows has been changed. >Therefore I am trying to set up the map-reduce job with multiple scan >objects, is this possible? >I am seeing some post online suggesting extending the InputFormat object >and change the getSplits, is this the most efficient way? > >Using filter seems to be not very efficient in my case because it's >basically still scan the whole table,right? Just filter out some certain >records. > >Can you point me to the right direction? > > >Thanks >Leon