[
https://issues.apache.org/jira/browse/PIG-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236875#comment-15236875
]
liyunzhang_intel commented on PIG-4846:
---------------------------------------
[~rohini],[~mohitsabharwal],[~xuefuz],[~pallavi.rao],[~kexianda]:
Here to compare the pigmix result in mr and spark(yarn-client) mode:
modify pig/test/perf/pigmix/conf/config.sh, i met a problem when rows is set as
625000 maybe the data is too big.
{code}
....
# ~1600 bytes per row for page_views (it is the base for most other inputs)
#rows=625000000
rows=625000
...
{code}
*test environment*: a machine with 60G memory
run hadoop, spark on single node.
{code}
#jps
193152 HistoryServer
198016 NameNode
142148 NodeManager
198251 DataNode
192928 Worker
210563 Launcher
236090 Jps
198457 SecondaryNameNode
192702 Master
142020 ResourceManager
{code}
set the memory of nodemanager as 10g
{code}
#grep -C2 "yarn.nodemanager.resource.memory-mb"
$HADOOP_HOME/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10240</value>
<description>the amount of memory on the NodeManager in MB</description>
{code}
set the memory of spark executor as 10g
{code}
#grep -C2 "executor.memory" conf/spark-defaults.conf
spark.executor.memory 10g
{code}
The result is:
||Script||MR||Spark||
|L_1|208|71|
|L_2|340|46|
|L_3|390|1746|
|L_4|202|49|
|L_5|197|1754|
|L_6|202|60|
|L_7|202|1002|
|L_8|177|46|
|L_9|543|65|
|L_10|543|67|
|L_11|669|196|
|L_12|202|56|
|L_13|192|1010|
|L_14|365|52|
|L_15|202|1018|
|L_16|202|1022|
|L_17|202|109|
It seems that spark has better performance than MR in some script while some
are worse.
I have not much experience on spark tunning. If have any problem about the
configuration, please tell me.
> Use pigmix to test the performance of pig on spark
> --------------------------------------------------
>
> Key: PIG-4846
> URL: https://issues.apache.org/jira/browse/PIG-4846
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4846.patch, PIG-4846_1.patch
>
>
> We can compare the performance between mr and spark mode by pigmix.
> The introduction of pigmix is
> https://cwiki.apache.org/confluence/display/PIG/PigMix.
> PIG-4846.patch is to make pigmix run by specied exectype.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)