What format is the input data in?

At first glance, I would run an identity mapper and use a
NullOutputFormat so you don't get any data written. The built in
counters already count the number of key, value pairs read in by the
mappers.

-Joey

On Fri, May 20, 2011 at 9:34 AM, W.P. McNeill <[email protected]> wrote:
> I've got a directory with a bunch of MapReduce data in it.  I want to know
> how many <Key, Value> pairs it contains.  I could write a mapper-only
> process that takes <Writeable, Writeable> pairs as input and updates a
> counter, but it seems like this utility should already exist.  Does it, or
> do I have to roll my own?
>
> Bonus question, is there a way to count the number of <Key, Value> pairs
> without deserializing the values?  This can be expensive for the data I'm
> working with.
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Reply via email to