[
https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794939#comment-13794939
]
Gabriel Reid commented on CRUNCH-278:
-------------------------------------
Yeah, I think that that could work for the more general case. Calling toBundle
on a PCollection would then back up to the last call to materialize and execute
everything from there on in memory, and the default case is to do nothing in
memory.
The only issue I see with this is that it makes the materialize() call into
something that visibly mutates the state of a PCollection. Materializing a
PCollection mutates state under the covers anyhow, but adding these semantics
to materialize very slightly breaks the idea of immutability around
PCollection. That's probably not a big enough reason to not take this approach
though.
> Improvements to MapsideJoin code
> --------------------------------
>
> Key: CRUNCH-278
> URL: https://issues.apache.org/jira/browse/CRUNCH-278
> Project: Crunch
> Issue Type: Bug
> Components: Core, MapReduce Patterns
> Reporter: Josh Wills
> Assignee: Josh Wills
> Attachments: CRUNCH-278.patch
>
>
> The fact that we have special-case code in the MapsideJoinStrategy for the
> in-memory and MR-based Pipeline instances has always bugged me, so I set out
> to eliminate the distinction between the two impls by creating a new
> interface, ReadableSourceBundle<T>, that encapsulates the MR and in-memory
> specific logic for doing mapside joins in order to remove the special-case
> code in MapsideJoinStrategy and hopefully make other implementations that use
> our mapside-join patterns much easier to test.
--
This message was sent by Atlassian JIRA
(v6.1#6144)