[
https://issues.apache.org/jira/browse/PIG-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393408#comment-15393408
]
liyunzhang_intel commented on PIG-4937:
---------------------------------------
[~rohini] and [~daijy]:
After generating all the test data(1TB), I have run first round of test in mr
mode.
The cluster has 8 nodes(each node has 40 cores and 60g memory, will assign 28
cores and 56g for nodemanager on the node). Total cores and memory for the
cluster is 224 cores and 448g memory.
The snippet of yarn-site.xml:
{code}
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>57344</value>
<description>the amount of memory on the NodeManager in MB</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>28</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>57344</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for
containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting
memory limits for containers</description>
</property>
{code}
The snippet of mapred-site.xml is
{code}
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1638m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx3276m</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>820</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>1200000</value>
</property>
{code}
The snippet of hdfs-site.xml
{code}
<property>
<name>dfs.blocksize</name>
<value>1124217344</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.socket.timeout</name>
<value>1200000</value>
</property>
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>1200000</value>
</property>
{code}
The result of last run of pigmix in mr mode(L9,10,13,14,17 fail). It shows that
the average time spent on one script is nearly 6 hours. I don't know whether
it really need so much time to run L1~L17? Can you share configuration and
expected result with me if you have some experience on it ?
||Script||MR
|L_1|21544
|L_2|20482
|L_3|21629
|L_4|20905
|L_5|20738
|L_6|24131
|L_7|21983
|L_8|24549
|L_9|6585(Fail)
|L_10|22286(Fail)
|L_11|21849
|L_12|21266
|L_13|11099(Fail)
|L_14|43(Fail)
|L_15|23808
|L_16|42889
|L_17|10(Fail)
> Pigmix hangs when generating data after rows is set as 625000000 in
> test/perf/pigmix/conf/config.sh
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-4937
> URL: https://issues.apache.org/jira/browse/PIG-4937
> Project: Pig
> Issue Type: Bug
> Reporter: liyunzhang_intel
> Attachments: pigmix1.PNG, pigmix2.PNG
>
>
> use the default setting in test/perf/pigmix/conf/config.sh, generate data by
> "ant -v -Dharness.hadoop.home=$HADOOP_HOME -Dhadoopversion=23 pigmix-deploy
> >ant.pigmix.deploy"
> it hangs in the log:
> {code}
> [exec] Generating mapping file for column d:1:100000:z:5 into
> hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp-786100428
> [exec] processed 99%.
> [exec] Generating input files into
> hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp595036324
> [exec] Submit hadoop job...
> [exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to
> ResourceManager at bdpe41/10.239.47.137:8032
> [exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to
> ResourceManager at bdpe41/10.239.47.137:8032
> [exec] 16/06/25 23:06:32 INFO mapred.FileInputFormat: Total input paths
> to process : 90
> [exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: number of splits:90
> [exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: Submitting tokens
> for job: job_1466776148247_0034
> [exec] 16/06/25 23:06:33 INFO impl.YarnClientImpl: Submitted application
> application_1466776148247_0034
> [exec] 16/06/25 23:06:33 INFO mapreduce.Job: The url to track the job:
> http://bdpe41:8088/proxy/application_1466776148247_0034/ [exec] 16/06/25
> 23:06:33 INFO mapreduce.Job: Running job: job_1466776148247_0034
> [exec] 16/06/25 23:06:38 INFO mapreduce.Job: Job job_1466776148247_0034
> running in uber mode : false
> [exec] 16/06/25 23:06:38 INFO mapreduce.Job: map 0% reduce 0%
> [exec] 16/06/25 23:06:53 INFO mapreduce.Job: map 2% reduce 0%
> [exec] 16/06/25 23:06:59 INFO mapreduce.Job: map 26% reduce 0%
> [exec] 16/06/25 23:07:00 INFO mapreduce.Job: map 61% reduce 0%
> [exec] 16/06/25 23:07:02 INFO mapreduce.Job: map 62% reduce 0%
> [exec] 16/06/25 23:07:03 INFO mapreduce.Job: map 64% reduce 0%
> [exec] 16/06/25 23:07:04 INFO mapreduce.Job: map 79% reduce 0%
> [exec] 16/06/25 23:07:05 INFO mapreduce.Job: map 86% reduce 0%
> [exec] 16/06/25 23:07:06 INFO mapreduce.Job: map 92% reduce 0%
> {code}
> When i use 625000 as the rows in test/perf/pigmix/conf/config.sh, it is
> successful to generate test data. So is the problem on the limit
> resources(disk size or others)? My env is 3 nodes cluster(each node has
> about a disk(830G)) and i assign memory and cpu in the yarn-site.xml like
> following:
> {code}
> yarn.nodemanager.resource.memory-mb=56G
> yarn.nodemanger.resource.cpu-vcores=28
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)