[ https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638956#comment-16638956 ]
Satish Subhashrao Saley commented on PIG-5342: ---------------------------------------------- Could you please amend the commit? BloomFilterPartitioner class wasn't committed. {code:java} [echo] *** Building Main Sources *** [echo] *** To compile with all warnings enabled, supply -Dall.warnings=1 on command line *** [echo] *** Else, you will only be warned about deprecations *** [echo] *** Hadoop version used: 2 ; HBase version used: 1 ; Spark version used: 2 *** [javac] Compiling 1106 source files to /Users/saley/src/pig/build/classes [javac] /Users/saley/src/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java:113: error: cannot find symbol [javac] import org.apache.pig.backend.hadoop.executionengine.tez.runtime.BloomFilterPartitioner; [javac] ^ [javac] symbol: class BloomFilterPartitioner [javac] location: package org.apache.pig.backend.hadoop.executionengine.tez.runtime [javac] /Users/saley/src/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java:1495: error: cannot find symbol [javac] edge.partitionerClass = BloomFilterPartitioner.class; [javac] ^ [javac] symbol: class BloomFilterPartitioner [javac] location: class TezCompiler [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors {code} > Add setting to turn off bloom join combiner > ------------------------------------------- > > Key: PIG-5342 > URL: https://issues.apache.org/jira/browse/PIG-5342 > Project: Pig > Issue Type: Sub-task > Reporter: Satish Subhashrao Saley > Assignee: Satish Subhashrao Saley > Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5342-1.patch, PIG-5342-2.patch, PIG-5342-3.patch, > PIG-5342-4.patch, PIG-5342-5.patch, PIG-5342-6.patch, PIG-5342-7.patch, > PIG-5342-8.patch > > > 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom > join. When the keys are all unique, the combiner is unnecessary overhead. > 2) In previous case, the keys were the bloom filter index and the values were > the join key. Combining involved doing a distinct on the bag of values which > has memory issues for more than 10 million records. That needs to be flipped > and distinct combiner used to scale to a billions of records. > 3) Mention in documentation that bloom join is also ideal in cases of right > outer join with smaller dataset on the right. Replicate join only supports > left outer join. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)