Hi, May I ask another question?
I'm running HBase/Hadoop on linux server, and implementing business application with java, which runs on a different windows machine. It looks like MapReduce job runs on a server node. Can I run the MapReduce job built on windows client with an existing linux server? How can we get result done by MapReduce job at the server? e.g. scanning specific table with some filter conditions and return sum of specific columns... Regards, Jaeyun Noh. On Wed, Oct 8, 2008 at 2:13 PM, stack <[EMAIL PROTECTED]> wrote: > Cedric Ho wrote: > >> Hi all, >> >> I am using 0.18.0 and have successfully used data from hbase table as >> input to my map/reduce job. >> >> I wonder how to specify a subset of records from a table instead of >> taking all records as input. >> Such as a range of the row keys or maybe by specific values of certain >> columns. >> >> > You'll have to subclass the TableInputFormat. > > There is an example in the javadoc on subclassing TIF: > http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html(Sorry, > the example is mangled. Do a get of the html source to see > non-garbled code). > > The example shows you how to set a filter. Filters can filter on rows and > values. > > To work against a subset, you'd probably need to play with getSplits in > your subclass. Default, it basically eretrns as many splits as there are > regions in your table, so its the whole table always. Filters could stop > unwanted rows being returned but maybe its better if the rows weren't > considered in the first place; hence the need of getSplits subclassing. > > St.Ack > >
