[ 
https://issues.apache.org/jira/browse/CRUNCH-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated CRUNCH-211:
--------------------------------

    Attachment: CRUNCH-211.patch

Path to provide one-to-many join functionality
                
> Add one-to-many join functionality
> ----------------------------------
>
>                 Key: CRUNCH-211
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-211
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>         Attachments: CRUNCH-211.patch
>
>
> A common pattern is a join between two tables where the left-side table 
> contains a single value per key, and the right-side table contains multiple 
> values per key. An example of such a join would be a join between users and 
> web click entries:
>     PTable<Long,User> usersById = ...;
>     PTable<Long,WebClick> webClicksByUserId = ...;
> In this case, there can be some situations where it is desirable to bring the 
> User together with the iterable of all WebClicks. The current join 
> functionality will replicate the User for each WebClick that it's related to, 
> but each WebClick then needs to be dealt with completely separately.
> Currently, the only way of getting an iterable of WebClicks together with a 
> single User in a single method call is by materializing all WebClicks per 
> user in memory using something like PTable#collectValues, and this approach 
> doesn't work when there are a large number of WebClicks.
> The intention of this ticket is to add functionality whereby the User and 
> Iterable of WebClicks are available in a single method call, without the 
> Iterable of WebClicks being materialized in memory (i.e. a feasible approach 
> for millions or more WebClicks).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to