Gabriel Reid created CRUNCH-211:
-----------------------------------
Summary: Add one-to-many join functionality
Key: CRUNCH-211
URL: https://issues.apache.org/jira/browse/CRUNCH-211
Project: Crunch
Issue Type: Bug
Reporter: Gabriel Reid
A common pattern is a join between two tables where the left-side table
contains a single value per key, and the right-side table contains multiple
values per key. An example of such a join would be a join between users and web
click entries:
PTable<Long,User> usersById = ...;
PTable<Long,WebClick> webClicksByUserId = ...;
In this case, there can be some situations where it is desirable to bring the
User together with the iterable of all WebClicks. The current join
functionality will replicate the User for each WebClick that it's related to,
but each WebClick then needs to be dealt with completely separately.
Currently, the only way of getting an iterable of WebClicks together with a
single User in a single method call is by materializing all WebClicks per user
in memory using something like PTable#collectValues, and this approach doesn't
work when there are a large number of WebClicks.
The intention of this ticket is to add functionality whereby the User and
Iterable of WebClicks are available in a single method call, without the
Iterable of WebClicks being materialized in memory (i.e. a feasible approach
for millions or more WebClicks).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira