[ 
https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638956#comment-16638956
 ] 

Satish Subhashrao Saley commented on PIG-5342:
----------------------------------------------

Could you please amend the commit? BloomFilterPartitioner class wasn't 
committed. 
{code:java}
     [echo] *** Building Main Sources ***

     [echo] *** To compile with all warnings enabled, supply -Dall.warnings=1 
on command line ***

     [echo] *** Else, you will only be warned about deprecations ***

     [echo] *** Hadoop version used: 2 ; HBase version used: 1 ; Spark version 
used: 2 ***

    [javac] Compiling 1106 source files to /Users/saley/src/pig/build/classes

    [javac] 
/Users/saley/src/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java:113:
 error: cannot find symbol

    [javac] import 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.BloomFilterPartitioner;

    [javac]                                                                 ^

    [javac]   symbol:   class BloomFilterPartitioner

    [javac]   location: package 
org.apache.pig.backend.hadoop.executionengine.tez.runtime

    [javac] 
/Users/saley/src/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java:1495:
 error: cannot find symbol

    [javac]             edge.partitionerClass = BloomFilterPartitioner.class;

    [javac]                                     ^

    [javac]   symbol:   class BloomFilterPartitioner

    [javac]   location: class TezCompiler

    [javac] Note: Some input files use or override a deprecated API.

    [javac] Note: Recompile with -Xlint:deprecation for details.

    [javac] Note: Some input files use unchecked or unsafe operations.

    [javac] Note: Recompile with -Xlint:unchecked for details.

    [javac] 2 errors

{code}

> Add setting to turn off bloom join combiner
> -------------------------------------------
>
>                 Key: PIG-5342
>                 URL: https://issues.apache.org/jira/browse/PIG-5342
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Major
>             Fix For: 0.18.0
>
>         Attachments: PIG-5342-1.patch, PIG-5342-2.patch, PIG-5342-3.patch, 
> PIG-5342-4.patch, PIG-5342-5.patch, PIG-5342-6.patch, PIG-5342-7.patch, 
> PIG-5342-8.patch
>
>
> 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom 
> join. When the keys are all unique, the combiner is unnecessary overhead.
> 2) In previous case, the keys were the bloom filter index and the values were 
> the join key. Combining involved doing a distinct on the bag of values which 
> has memory issues for more than 10 million records. That needs to be flipped 
> and distinct combiner used to scale to a billions of records.
> 3) Mention in documentation that bloom join is also ideal in cases of right 
> outer join with smaller dataset on the right. Replicate join only supports 
> left outer join.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to