Hi, I've read that the combiner only works if it is specified AND the sort memory buffer overflows in the mapper. http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201107.mbox/%3c374d8f3f-b8b1-499f-bedb-bfee32190...@hortonworks.com%3E
But when I run a Hadoop streaming job in R using RHadoop, the combiner always runs when specified. This is on a very small dataset. Is this the desired behaviour? Thanks, Sudip Sinha