[ 
https://issues.apache.org/jira/browse/PIG-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351360#comment-15351360
 ] 

Rohini Palaniswamy commented on PIG-4937:
-----------------------------------------

How much disk space do you have? The input dataset is 1TB and couple of tests 
generate same amount of data. There is an option to cleanup output right after 
test. If you use that you will need at least 3TB of hdfs space (9TB including 
replication factor of 3).  I would suggest adding more nodes if possible. 3 
nodes is too small and might take 1-2 days to finish running the tests.

For datagen, you don't need mapreduce.task.io.sort.mb 1G. So you can reduce 
that and also reduce container size to 1G. 

 

> Pigmix hangs when generating data after rows  is set as 625000000 in  
> test/perf/pigmix/conf/config.sh
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-4937
>                 URL: https://issues.apache.org/jira/browse/PIG-4937
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>         Attachments: pigmix1.PNG, pigmix2.PNG
>
>
> use the default setting in test/perf/pigmix/conf/config.sh, generate data by
> "ant -v -Dharness.hadoop.home=$HADOOP_HOME -Dhadoopversion=23  pigmix-deploy 
> >ant.pigmix.deploy"
> it hangs in the log:
> {code}
>  [exec] Generating mapping file for column d:1:100000:z:5 into 
> hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp-786100428
>      [exec] processed 99%.
>      [exec] Generating input files into 
> hdfs://bdpe41:8020/user/root/tmp/tmp-1056793210/tmp595036324
>      [exec] Submit hadoop job...
>      [exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to 
> ResourceManager at bdpe41/10.239.47.137:8032
>      [exec] 16/06/25 23:06:32 INFO client.RMProxy: Connecting to 
> ResourceManager at bdpe41/10.239.47.137:8032
>      [exec] 16/06/25 23:06:32 INFO mapred.FileInputFormat: Total input paths 
> to process : 90
>      [exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: number of splits:90
>      [exec] 16/06/25 23:06:32 INFO mapreduce.JobSubmitter: Submitting tokens 
> for job: job_1466776148247_0034
>      [exec] 16/06/25 23:06:33 INFO impl.YarnClientImpl: Submitted application 
> application_1466776148247_0034
>      [exec] 16/06/25 23:06:33 INFO mapreduce.Job: The url to track the job: 
> http://bdpe41:8088/proxy/application_1466776148247_0034/     [exec] 16/06/25 
> 23:06:33 INFO mapreduce.Job: Running job: job_1466776148247_0034
>      [exec] 16/06/25 23:06:38 INFO mapreduce.Job: Job job_1466776148247_0034 
> running in uber mode : false
>      [exec] 16/06/25 23:06:38 INFO mapreduce.Job:  map 0% reduce 0%
>      [exec] 16/06/25 23:06:53 INFO mapreduce.Job:  map 2% reduce 0%
>      [exec] 16/06/25 23:06:59 INFO mapreduce.Job:  map 26% reduce 0%
>      [exec] 16/06/25 23:07:00 INFO mapreduce.Job:  map 61% reduce 0%
>      [exec] 16/06/25 23:07:02 INFO mapreduce.Job:  map 62% reduce 0%
>      [exec] 16/06/25 23:07:03 INFO mapreduce.Job:  map 64% reduce 0%
>      [exec] 16/06/25 23:07:04 INFO mapreduce.Job:  map 79% reduce 0%
>      [exec] 16/06/25 23:07:05 INFO mapreduce.Job:  map 86% reduce 0%
>      [exec] 16/06/25 23:07:06 INFO mapreduce.Job:  map 92% reduce 0%
> {code}
> When i use 625000 as the rows in test/perf/pigmix/conf/config.sh, it is 
> successful to generate test data. So is the problem on the limit 
> resources(disk size or others)?  My env is 3 nodes cluster(each node has 
> about a disk(830G)) and i assign memory and cpu in the yarn-site.xml like 
> following:
> {code}
>  yarn.nodemanager.resource.memory-mb=56G
>  yarn.nodemanger.resource.cpu-vcores=28
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to