Hi all:
Now I'm using pigmix to test the performance of Pig On
Spark(PIG-4937<https://issues.apache.org/jira/browse/PIG-4937>). The test data
is 1TB. After generating all the test data, I have run first round of test in
mr mode.
The cluster has 8 nodes(each node has 40 cores and 60g memory, will assign 28
cores and 56g for nodemanager on the node). Total cores and memory for the
cluster is 224 cores and 448g memory.
The snippet of yarn-site.xml:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>57344</value>
<description>the amount of memory on the NodeManager in MB</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>28</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>57344</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for
containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting
memory limits for containers</description>
</property>
The snippet of mapred-site.xml is
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1638m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx3276m</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>820</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>1200000</value>
</property>
The snippet of hdfs-site.xml
<property>
<name>dfs.blocksize</name>
<value>1124217344</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.socket.timeout</name>
<value>1200000</value>
</property>
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>1200000</value>
</property>
The result of last run of pigmix in mr mode(L9,10,13,14,17 fail). It shows that
the average time spent on one script is nearly 6 hours. I don't know whether
it really need so much time to run L1~L17? Can anyone who has experience on
pigmix share his/her configuration and expected result with me?
MR(sec)
L1
21544
L2
20482
L3
21629
L4
20905
L5
20738
L6
24131
L7
21983
L8
24549
L9
6585(Fail)
L10
22286(Fail)
L11
21849
L12
21266
L13
11099(Fail)
L14
43(Fail)
L15
23808
L16
42889
L17
10(Fail)
Kelly Zhang/Zhang,Liyun
Best Regards