[ 
https://issues.apache.org/jira/browse/CRUNCH-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210769#comment-15210769
 ] 

Stefan De Smit commented on CRUNCH-598:
---------------------------------------

 I would only fix Shardedjoinstrategy, as it is possible by passing your own 
joinfn in defaultjoin This scalefactor is pretty custom anyway. I think the 
default factor of 1 is in most cases not accurate for a join, but as there is a 
way, I don't think an extra argument brings extra value.
The shared join does not have a way and needs it more.
I would just add a method "getAverageNumShards()" in the ShardingStrategy 
interface, and call this to set the scaling for leftshardfn. As that is the 
correct Scalefactor of this function.
In Constantshardingstrategy, this method can just return the same numshards.
For another strategy the scalefactor would be some kind of average shard, that 
the user has to define.


> scaleFactor for JoinStrategy
> ----------------------------
>
>                 Key: CRUNCH-598
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-598
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Stefan De Smit
>            Priority: Minor
>
> the scaleFactor method has a big influence on planner.
> For joins, there currently isn't a clean way to set this, while it often is 
> required, as a join can have a big multiply factor.
> for the DefaultJoinStrategy, it's possible to add a custom JoinFn with proper 
> scaleFactor, or just extend the default InnerJoinFn with a scaleFactor.
> For the ShardedJoinStrategy, this isn't possible, while it often is needed 
> more (as ShardedJoin is especially handy for 1 to really many).
> For the default ConstantShardingStrategy, it might make sense to use the 
> numShards also as scalingFactor for left side. as that's kind of what 
> happens: emit every left entry numShards times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to