Hi, I'm running a few tests on small test data (the data consists of 150 Megabytes of input data, resulting in 150 000 unique map output records, resulting in 150 000 reducer output records as well).
When I run this locally (running as java application from eclipse, LocalJobRunner), the reducer finishes in 0 seconds. Running the same code on the same machine within the hadoop framework (in the google hadoop vmware image), always results in a reduce phase of over 10 seconds (1 reducer, from map 100% till reduce 100%). Running it out of vmware on an amazon EC2 cluster gives me about the same results (also with more nodes in the cluster) Any ideas on what might cause the slowdown? Is this simply the hadoop framework overhead I have to live with? Thanks, Thibaut (Map input is about 150 Megs, map output records = 150 000 = reducer output records as well). -- View this message in context: http://www.nabble.com/Reduce-Performance-%28LocalJobRunner-vs-Hadoop-Framework%29-tp14372547p14372547.html Sent from the Hadoop Users mailing list archive at Nabble.com.
