[ 
https://issues.apache.org/jira/browse/CRUNCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201375#comment-15201375
 ] 

Gabriel Reid commented on CRUNCH-596:
-------------------------------------

First off, sorry I missed your long-standing pull request for this -- I saw it 
pass by a while back and didn't get on it.

This looks really good -- a great solution for something that I was pretty much 
convinced wasn't possible.

Very good point in the javadoc about setting dealing with the deep-copying that 
will occur by using the two filters to split the right side up into two 
PCollections. However, I think that we can get around that by putting the 
"right" PCollection through a dummy parallelDo call with a DoFn that returns 
true for {{disableDeepCopy()}}. This will stop the deep copying that would 
happen otherwise before the values are passed to the two filter functions, and 
then there wouldn't be any need to set DISABLE_DEEP_COPY globally any more to 
get decent performance. Would you be able to update the patch with that little 
change? Other than that, this looks like it's good to go.



> Right and full outer join for Bloom filter strategy
> ---------------------------------------------------
>
>                 Key: CRUNCH-596
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-596
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.13.0
>            Reporter: Piotr Chromiec
>            Assignee: Josh Wills
>            Priority: Minor
>              Labels: features, github-import, newbie
>             Fix For: 0.14.0
>
>
> Seems that current Bloom filter join strategy lacks of support for right and 
> full outer joins. At RTBHOUSE we had recently found this as useful and 
> implemented for our internal project. Code for this feature with javadoc and 
> tests is pushed at GitHub 
> [PullRequest#9|https://github.com/apache/crunch/pull/9]
> I'm newbie here so forgive me if this issue is somehow incomplete or buggy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to