[ 
https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793879#comment-13793879
 ] 

Micah Whitacre commented on CRUNCH-278:
---------------------------------------

The MaterialzedPCollection seems nice because it meshes nicely with metaphors 
already in Crunch but seems dangerous for the ill-informed consumer.   
Specifically since the PCollection can be passed around it might be passed to 
functionality expecting to be able to persist the collection and then encounter 
the issue.

Therefore the bundle approach seems nice because it clearly sets that 
distinction.  To confirm though if we went with this approach...

{quote}
PTable<K, V> cnt = stuff.count();
ReadableSourceBundle<Pair<K, V>> = cnt.toBundle();
{quote}

Consumers could still do whatever processing/persisting they wanted with the 
"cnt" value correct?  So the cnt.toBundle()  would have no affect on it?  Also 
GBKs would be allowed prior to creating the bundle?  In HBase rows can be 
broken up in a PTable due to the configured batch size and could potentially 
require that grouping.

> Improvements to MapsideJoin code
> --------------------------------
>
>                 Key: CRUNCH-278
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-278
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, MapReduce Patterns
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-278.patch
>
>
> The fact that we have special-case code in the MapsideJoinStrategy for the 
> in-memory and MR-based Pipeline instances has always bugged me, so I set out 
> to eliminate the distinction between the two impls by creating a new 
> interface, ReadableSourceBundle<T>, that encapsulates the MR and in-memory 
> specific logic for doing mapside joins in order to remove the special-case 
> code in MapsideJoinStrategy and hopefully make other implementations that use 
> our mapside-join patterns much easier to test.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to