[
https://issues.apache.org/jira/browse/FLINK-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220775#comment-14220775
]
Stephan Ewen commented on FLINK-1267:
-------------------------------------
I think it may be useful, but is not really a urgent at this point.
The reason is that the groupReduce variant should be fine in virtually all
cases. If you have so many elements that the groupReduce variant runs out of
memory, you probably will not be able to compute the cross product of that
group with itself in any reasonable time anyways...
> Add crossGroup operator
> -----------------------
>
> Key: FLINK-1267
> URL: https://issues.apache.org/jira/browse/FLINK-1267
> Project: Flink
> Issue Type: New Feature
> Components: Java API, Local Runtime, Optimizer, Scala API
> Affects Versions: 0.7.0-incubating
> Reporter: Fabian Hueske
> Priority: Minor
>
> A common operator is to pair-wise compare or combine all elements of a group
> (there were two questions about this on the user mailing list, recently).
> Right now, this can be done in two ways:
> 1. {{groupReduce}}: consume and store the complete iterator in memory and
> build all pairs
> 2. do a self-{{Join}}: the engine builds all pairs of the full symmetric
> Cartesian product.
> Both approaches have drawbacks. The {{groupReduce}} variant requires that the
> full group fits into memory and is more cumbersome to implement for the user,
> but pairs can be arbitrarily built. The self-{{Join}} approach pushes most of
> the work into the system, but the execution strategy does not treat the
> self-join different from a regular join (both identical inputs are shuffled,
> etc.) and always builds the full symmetric Cartesian product.
> I propose to add a dedicated {{crossGroup()}} operator, that offers this
> functionality in a proper way.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)