[ 
https://issues.apache.org/jira/browse/PIG-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4846:
----------------------------------
    Attachment: PIG-4846.patch

The steps to use pigmix to test are
1. generate data
#ant clean -Dharness.hadoop.home=$HADOOP_HOME -Dhadoopversion=23  pigmix-deploy
You can modify the data size of data generation, If you want to make the test 
data smaller, edit pig/test/perf/pigmix/conf/config.sh
{code}
# ~1600 bytes per row for page_views (it is the base for most other inputs)
#rows=625000000
rows=625
{code}
2.  run in mr  and spark mode
 ant -Dharness.hadoop.home=$HADOOP_HOME  -Dhadoopversion=23 -Dexectype=mr  
pigmix  > ant.pigmix.mr

ant -Dharness.hadoop.home=$HADOOP_HOME  -Dhadoopversion=23 -Dexectype=spark  
pigmix  > ant.pigmix.spark

   


> Use pigmix to test the performance of pig on spark
> --------------------------------------------------
>
>                 Key: PIG-4846
>                 URL: https://issues.apache.org/jira/browse/PIG-4846
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4846.patch
>
>
> We can compare the performance between mr and spark mode by pigmix.
> The introduction of pigmix is 
> https://cwiki.apache.org/confluence/display/PIG/PigMix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to