The keys are Text and the values are large custom data structures serialized with Avro.
I also have counters for the job that generates these files that gives me this information but sometimes...Well, it's a long story. Suffice to say that it's nice to have a post-hoc method too. :-) The identity mapper sounds like the way to go.
