If you don't make the assumption in your reduce function that you can fit all values for a key in memory, what's the preferred way of outputting a collection of values? I've been using ArrayWritable, but this requires you first build up an array of values in memory. This worked until I ramped up the size of the input and started getting out of memory errors.
IdentityReducer would work, but it seems wasteful to output the key for each value. Right now I'm doing emit(key, "") for the key and emit("", value) for each value, but this feels like a hack. It also makes for additional work to serialize back into key/value pairs, unlike the (memory-consuming) ArrayWritable approach. Ed