[ 
https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794939#comment-13794939
 ] 

Gabriel Reid commented on CRUNCH-278:
-------------------------------------

Yeah, I think that that could work for the more general case. Calling toBundle 
on a PCollection would then back up to the last call to materialize and execute 
everything from there on in memory, and the default case is to do nothing in 
memory.

The only issue I see with this is that it makes the materialize() call into 
something that visibly mutates the state of a PCollection. Materializing a 
PCollection mutates state under the covers anyhow, but adding these semantics 
to materialize very slightly breaks the idea of immutability around 
PCollection. That's probably not a big enough reason to not take this approach 
though.

> Improvements to MapsideJoin code
> --------------------------------
>
>                 Key: CRUNCH-278
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-278
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, MapReduce Patterns
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-278.patch
>
>
> The fact that we have special-case code in the MapsideJoinStrategy for the 
> in-memory and MR-based Pipeline instances has always bugged me, so I set out 
> to eliminate the distinction between the two impls by creating a new 
> interface, ReadableSourceBundle<T>, that encapsulates the MR and in-memory 
> specific logic for doing mapside joins in order to remove the special-case 
> code in MapsideJoinStrategy and hopefully make other implementations that use 
> our mapside-join patterns much easier to test.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to