Is there no way to find out inside a single reducer how many records were created by all the Mappers? I tried several ways but nothing works. For example, I tried this:
reporter.getCounter(Task.Counter.REDUCE_INPUT_RECORDS).getValue(); It's not working for me. Should this have worked? Am I just doing something dumb? I would rather not create another MR job just to count # of lines. On Sat, May 12, 2012 at 7:07 PM, Bryan Beaudreault <bbeaudrea...@hubspot.com > wrote: > I did a very similar approach and it worked fine for me. Just spot check > the regions after to make sure they look lexicographically sorted. I used > ImmutableBytesWritable as my key, and the default hadoop sorting for that > turned out to sort lexicographically as required. Our hbase rows varied in > size, so instead of doing a count of the number of rows, we tallied up the > KeyValue.getLenght() for each KeyValue in a row until the size reached a > certain limit. > > On Sat, May 12, 2012 at 7:21 PM, Something Something < > mailinglist...@gmail.com> wrote: > > > Hello, > > > > This is really a MapReduce question, but the output from this will be > used > > to create regions for an HBase table. Here's what I want to do: > > > > Take an input file that contains data about users. > > Sort this file by a key (which consists of a few fields from the row) > > After every x # of rows write the key. > > > > > > Here's how I was going to structure my MapReduce: > > > > public Splitter { > > > > static int counter; > > > > private Mapper { > > map() { > > Build key by concatenating fields > > Write key > > increment counter; > > } > > } > > > > // # of reducers will be set to 1. My understanding is that this > will > > send the lines to reducer in sorted order one at a time - is this a > correct > > assumption? > > private Reducer { > > static long i; > > reduce() { > > static long splitSize = counter / 300; // 300 is region > size > > if (i == 0 || i == splitSize) { > > Write key; // this will be used as a 'startkey'. > > i = 0; > > } > > i++; > > } > > } > > } > > > > To summarize, there are 2 questions: > > > > 1) I am passing # of rows processed by Mapper to Reducer via a static > > counter. Would this work? Is there a better way? > > 2) If I set # of reducers to 1, would the lines be sent to reducer in > > sorted order one at a time? > > > > Thanks in advance for the help. > > >