[
https://issues.apache.org/jira/browse/PIG-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4937:
----------------------------------
Description:
use the default setting in test/perf/pigmix/conf/config.sh, generate data by
"ant -v -Dharness.hadoop.home=$HADOOP_HOME -Dhadoopversion=23 pigmix-deploy
>ant.pigmix.deploy"
it hangs in the log:
{code}
[exec] Generating mapping file for column d:1:100000:z:5 into
hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp-786100428
[exec] processed 99%.
[exec] Generating input files into
hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp595036324
[exec] Submit hadoop job...
[exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to
ResourceManager at bdpe41/10.239.47.137:8032
[exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to
ResourceManager at bdpe41/10.239.47.137:8032
[exec] 16/06/25 23:06:32 INFO mapred.FileInputFormat: Total input paths to
process : 90
[exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: number of splits:90
[exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: Submitting tokens
for job: job_1466776148247_0034
[exec] 16/06/25 23:06:33 INFO impl.YarnClientImpl: Submitted application
application_1466776148247_0034
[exec] 16/06/25 23:06:33 INFO mapreduce.Job: The url to track the job:
http://bdpe41:8088/proxy/application_1466776148247_0034/ [exec] 16/06/25
23:06:33 INFO mapreduce.Job: Running job: job_1466776148247_0034
[exec] 16/06/25 23:06:38 INFO mapreduce.Job: Job job_1466776148247_0034
running in uber mode : false
[exec] 16/06/25 23:06:38 INFO mapreduce.Job: map 0% reduce 0%
[exec] 16/06/25 23:06:53 INFO mapreduce.Job: map 2% reduce 0%
[exec] 16/06/25 23:06:59 INFO mapreduce.Job: map 26% reduce 0%
[exec] 16/06/25 23:07:00 INFO mapreduce.Job: map 61% reduce 0%
[exec] 16/06/25 23:07:02 INFO mapreduce.Job: map 62% reduce 0%
[exec] 16/06/25 23:07:03 INFO mapreduce.Job: map 64% reduce 0%
[exec] 16/06/25 23:07:04 INFO mapreduce.Job: map 79% reduce 0%
[exec] 16/06/25 23:07:05 INFO mapreduce.Job: map 86% reduce 0%
[exec] 16/06/25 23:07:06 INFO mapreduce.Job: map 92% reduce 0%
{code}
When i use 625000 as the rows in test/perf/pigmix/conf/config.sh, it is
successful to generate test data. So is the problem on the limit resources(disk
size or others)? My env is 3 nodes cluster(each node has about a disk(830G))
and i assign memory and cpu in the yarn-site.xml like following:
{code}
yarn.nodemanager.resource.memory-mb=56G
yarn.nodemanger.resource.cpu-vcores=28
{code}
was:
use the default setting in test/perf/pigmix/conf/config.sh, generate data by
"ant -v -Dharness.hadoop.home=$HADOOP_HOME -Dhadoopversion=23 pigmix-deploy
>ant.pigmix.deploy"
it hangs in the log:
{code}
[exec] Generating mapping file for column d:1:100000:z:5 into
hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp-786100428
[exec] processed 99%.
[exec] Generating input files into
hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp595036324
[exec] Submit hadoop job...
[exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to
ResourceManager at bdpe41/10.239.47.137:8032
[exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to
ResourceManager at bdpe41/10.239.47.137:8032
[exec] 16/06/25 23:06:32 INFO mapred.FileInputFormat: Total input paths to
process : 90
[exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: number of splits:90
[exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: Submitting tokens
for job: job_1466776148247_0034
[exec] 16/06/25 23:06:33 INFO impl.YarnClientImpl: Submitted application
application_1466776148247_0034
[exec] 16/06/25 23:06:33 INFO mapreduce.Job: The url to track the job:
http://bdpe41:8088/proxy/application_1466776148247_0034/ [exec] 16/06/25
23:06:33 INFO mapreduce.Job: Running job: job_1466776148247_0034
[exec] 16/06/25 23:06:38 INFO mapreduce.Job: Job job_1466776148247_0034
running in uber mode : false
[exec] 16/06/25 23:06:38 INFO mapreduce.Job: map 0% reduce 0%
[exec] 16/06/25 23:06:53 INFO mapreduce.Job: map 2% reduce 0%
[exec] 16/06/25 23:06:59 INFO mapreduce.Job: map 26% reduce 0%
[exec] 16/06/25 23:07:00 INFO mapreduce.Job: map 61% reduce 0%
[exec] 16/06/25 23:07:02 INFO mapreduce.Job: map 62% reduce 0%
[exec] 16/06/25 23:07:03 INFO mapreduce.Job: map 64% reduce 0%
[exec] 16/06/25 23:07:04 INFO mapreduce.Job: map 79% reduce 0%
[exec] 16/06/25 23:07:05 INFO mapreduce.Job: map 86% reduce 0%
[exec] 16/06/25 23:07:06 INFO mapreduce.Job: map 92% reduce 0%
{code}
When i use 625000 as the rows in test/perf/pigmix/conf/config.sh, it is
successful to generate test data. So is the problem on the limit resources(disk
size or others)? My env is 3 nodes cluster(each node has about a disk(830G)).
> Pigmix hangs when generating data after rows is set as 625000000 in
> test/perf/pigmix/conf/config.sh
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-4937
> URL: https://issues.apache.org/jira/browse/PIG-4937
> Project: Pig
> Issue Type: Bug
> Reporter: liyunzhang_intel
>
> use the default setting in test/perf/pigmix/conf/config.sh, generate data by
> "ant -v -Dharness.hadoop.home=$HADOOP_HOME -Dhadoopversion=23 pigmix-deploy
> >ant.pigmix.deploy"
> it hangs in the log:
> {code}
> [exec] Generating mapping file for column d:1:100000:z:5 into
> hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp-786100428
> [exec] processed 99%.
> [exec] Generating input files into
> hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp595036324
> [exec] Submit hadoop job...
> [exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to
> ResourceManager at bdpe41/10.239.47.137:8032
> [exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to
> ResourceManager at bdpe41/10.239.47.137:8032
> [exec] 16/06/25 23:06:32 INFO mapred.FileInputFormat: Total input paths
> to process : 90
> [exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: number of splits:90
> [exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: Submitting tokens
> for job: job_1466776148247_0034
> [exec] 16/06/25 23:06:33 INFO impl.YarnClientImpl: Submitted application
> application_1466776148247_0034
> [exec] 16/06/25 23:06:33 INFO mapreduce.Job: The url to track the job:
> http://bdpe41:8088/proxy/application_1466776148247_0034/ [exec] 16/06/25
> 23:06:33 INFO mapreduce.Job: Running job: job_1466776148247_0034
> [exec] 16/06/25 23:06:38 INFO mapreduce.Job: Job job_1466776148247_0034
> running in uber mode : false
> [exec] 16/06/25 23:06:38 INFO mapreduce.Job: map 0% reduce 0%
> [exec] 16/06/25 23:06:53 INFO mapreduce.Job: map 2% reduce 0%
> [exec] 16/06/25 23:06:59 INFO mapreduce.Job: map 26% reduce 0%
> [exec] 16/06/25 23:07:00 INFO mapreduce.Job: map 61% reduce 0%
> [exec] 16/06/25 23:07:02 INFO mapreduce.Job: map 62% reduce 0%
> [exec] 16/06/25 23:07:03 INFO mapreduce.Job: map 64% reduce 0%
> [exec] 16/06/25 23:07:04 INFO mapreduce.Job: map 79% reduce 0%
> [exec] 16/06/25 23:07:05 INFO mapreduce.Job: map 86% reduce 0%
> [exec] 16/06/25 23:07:06 INFO mapreduce.Job: map 92% reduce 0%
> {code}
> When i use 625000 as the rows in test/perf/pigmix/conf/config.sh, it is
> successful to generate test data. So is the problem on the limit
> resources(disk size or others)? My env is 3 nodes cluster(each node has
> about a disk(830G)) and i assign memory and cpu in the yarn-site.xml like
> following:
> {code}
> yarn.nodemanager.resource.memory-mb=56G
> yarn.nodemanger.resource.cpu-vcores=28
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)