Hi, I'm reposting this as I've not received any reply to my earlier post on the same issue.
I've read that the combiner only works if it is specified AND the sort memory buffer overflows in the mapper. http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201107.mbox/%3c374d8f3f-b8b1-499f-bedb-bfee32190...@hortonworks.com%3E But when I run a Hadoop streaming job in R using RHadoop, the combiner always runs when specified. This is on a very small dataset. Is this the desired behaviour? More on this: https://github.com/RevolutionAnalytics/RHadoop/issues/70 Thanks, Sudip Sinha