Todd Lipcon wrote:
Ah, I misunderstood.

How about this?

mapper:
  for each word in line:
   add word to a set()
  for each word in set:
    emit (word, 1)
  emit (null, 1)
Oh, that's a good idea. Put the hashset in the mapper rather than the reducer. Thanks.

I don't mind post-processing in this case, I have to do that anyway.

Jim

Reply via email to