[ 
https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791959#comment-13791959
 ] 

Josh Wills commented on CRUNCH-278:
-----------------------------------

So I had two contexts in mind: in-memory for unit testing, but also having 
these DoFns running inside of a MR context, where they're not strictly part of 
the CrunchMapper/CrunchReducer flow, but operating more like embedded inside of 
the initialize() process that is reading records in from the distributed cache 
and then performing filters/transforms on them.

For example, think of being able to do mapside joins against (say) an HBase 
table, where you could construct the PTable of key-value pairs that is loaded 
in memory by reading the table into the client and then doing some processing 
on those values inside of the map initialization vs. having to run a MR job to 
process that data into a file as a pre-processing step to running the job. I'm 
not sure if that's the sort of thing folks would be interested in doing, but it 
seemed cool to me.

> Improvements to MapsideJoin code
> --------------------------------
>
>                 Key: CRUNCH-278
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-278
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, MapReduce Patterns
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-278.patch
>
>
> The fact that we have special-case code in the MapsideJoinStrategy for the 
> in-memory and MR-based Pipeline instances has always bugged me, so I set out 
> to eliminate the distinction between the two impls by creating a new 
> interface, ReadableSourceBundle<T>, that encapsulates the MR and in-memory 
> specific logic for doing mapside joins in order to remove the special-case 
> code in MapsideJoinStrategy and hopefully make other implementations that use 
> our mapside-join patterns much easier to test.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to