liyunzhang_intel created PIG-4891:
-------------------------------------
Summary: Implement FR join by broadcasting small rdd not making
more copys of data
Key: PIG-4891
URL: https://issues.apache.org/jira/browse/PIG-4891
Project: Pig
Issue Type: Sub-task
Components: spark
Reporter: liyunzhang_intel
In current implementation of FRJoin(PIG-4771), we just set the value of
replication of data as 10 to make the data access more efficiency because
current FRJoin algrithms can be reused in this way. We need to figure out how
to use broadcasting small rdd to implement FRJoin in current code base if we
find the performance can be improved a lot by using broadcasting rdd.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)