xiefeng created SPARK-17327:
-------------------------------
Summary: Throughput limitaion in spark standalone of simple task
without calculation.
Key: SPARK-17327
URL: https://issues.apache.org/jira/browse/SPARK-17327
Project: Spark
Issue Type: Question
Components: Java API, Windows
Affects Versions: 1.6.2
Environment: windows server 2008 R2 standard
Reporter: xiefeng
Fix For: 1.6.2
I install a spark standalone and run the spark cluster(one master and one
worker) in a windows 2008 server with 16cores and 24GB memory.
I have done a simple test: Just create a string RDD and simply return it. I
use JMeter to test throughput but the highest is around 35/sec. I think spark
is powerful at distribute calculation, but why the throughput is so limit in
such simple test scenario only contains simple task dispatch and no calculation?
1. In JMeter I test both 10 threads or 100 threads, there is little difference
around 2-3/sec.
2. I test both cache/not cache the RDD, there is little difference around
1-2/sec.
3. During the test, the cpu and memory is in low level.
Below is my test code:
@RestController
public class SimpleTest {
@RequestMapping(value = "/SimpleTest", method = RequestMethod.GET)
@ResponseBody
public String testProcessTransaction() {
return SparkShardTest.simpleRDDTest();
}
}
final static Map<String, JavaRDD<String>> simpleRDDs = initSimpleRDDs();
public static Map<String, JavaRDD<String>> initSimpleRDDs()
{
Map<String, JavaRDD<String>> result = new
ConcurrentHashMap<String, JavaRDD<String>>();
JavaRDD<String> rddData = JavaSC.parallelize(data;
rddData.cache().count(); //this cache will improve 1-2/sec
result.put("MyRDD", rddData);
return result;
}
public static String simpleRDDTest()
{
JavaRDD<String> rddData = simpleRDDs.get("MyRDD");
return rddData.first();
}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]