[
https://issues.apache.org/jira/browse/PIG-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4846:
----------------------------------
Attachment: PIG-4846.patch
The steps to use pigmix to test are
1. generate data
#ant clean -Dharness.hadoop.home=$HADOOP_HOME -Dhadoopversion=23 pigmix-deploy
You can modify the data size of data generation, If you want to make the test
data smaller, edit pig/test/perf/pigmix/conf/config.sh
{code}
# ~1600 bytes per row for page_views (it is the base for most other inputs)
#rows=625000000
rows=625
{code}
2. run in mr and spark mode
ant -Dharness.hadoop.home=$HADOOP_HOME -Dhadoopversion=23 -Dexectype=mr
pigmix > ant.pigmix.mr
ant -Dharness.hadoop.home=$HADOOP_HOME -Dhadoopversion=23 -Dexectype=spark
pigmix > ant.pigmix.spark
> Use pigmix to test the performance of pig on spark
> --------------------------------------------------
>
> Key: PIG-4846
> URL: https://issues.apache.org/jira/browse/PIG-4846
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4846.patch
>
>
> We can compare the performance between mr and spark mode by pigmix.
> The introduction of pigmix is
> https://cwiki.apache.org/confluence/display/PIG/PigMix.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)