[ 
https://issues.apache.org/jira/browse/PIG-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246099#comment-15246099
 ] 

Xuefu Zhang commented on PIG-4846:
----------------------------------

The basic idea is to make max use of the given resources (memory and cpu). 
Depending on which is scarece, we want to use the scarce one first, in your 
case, memory. In general, you want to have at least 2G for per core for spark, 
and 4, 5, or 6 cores per executor. In our case, we set 4 cores and 8G memory 
per executor. For executor memory, in general, 15-20% goes to memory overhead. 
Driver memory is less critical unless there is an OOM, which requires more 
memory. 2G is a good minimum.

For more details, I wrote a doc which was included in CDH5.7 for Hive on Spark. 
http://www.cloudera.com/documentation/enterprise/latest/topics/admin_hos_tuning.html.
 While that's for Hive on Spark, some of the configurations may apply to Pig as 
well.

Let me know if you have more questions.

> Use pigmix to test the performance of pig on spark
> --------------------------------------------------
>
>                 Key: PIG-4846
>                 URL: https://issues.apache.org/jira/browse/PIG-4846
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4846.patch, PIG-4846_1.patch
>
>
> We can compare the performance between mr and spark mode by pigmix.
> The introduction of pigmix is 
> https://cwiki.apache.org/confluence/display/PIG/PigMix.
> PIG-4846.patch is to make pigmix run by specied exectype.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to