What format is the input data in? At first glance, I would run an identity mapper and use a NullOutputFormat so you don't get any data written. The built in counters already count the number of key, value pairs read in by the mappers.
-Joey On Fri, May 20, 2011 at 9:34 AM, W.P. McNeill <[email protected]> wrote: > I've got a directory with a bunch of MapReduce data in it. I want to know > how many <Key, Value> pairs it contains. I could write a mapper-only > process that takes <Writeable, Writeable> pairs as input and updates a > counter, but it seems like this utility should already exist. Does it, or > do I have to roll my own? > > Bonus question, is there a way to count the number of <Key, Value> pairs > without deserializing the values? This can be expensive for the data I'm > working with. > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
