[
https://issues.apache.org/jira/browse/PIG-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mohit Sabharwal updated PIG-4190:
---------------------------------
Attachment: PIG-4190.2.patch
> Implement replicated join in Spark engine
> -----------------------------------------
>
> Key: PIG-4190
> URL: https://issues.apache.org/jira/browse/PIG-4190
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Praveen Rachabattuni
> Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4190.1.patch, PIG-4190.2.patch, PIG-4190.patch
>
>
> Related e2e tests: Union_7, Union_8, Union_13
> Sample script:
> a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
> b = load '/user/pig/tests/data/singlefile/studentcolon10k' using
> PigStorage(':') as (name, age, gpa);
> c = union a, b;
> d = load '/user/pig/tests/data/singlefile/votertab10k' as (name, age,
> registration, contributions);
> e = join c by name, d by name using 'replicated';
> store e into '/user/pig/out/praveenr-1411380943-nightly.conf/Union_7.out';
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)