[ 
https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792698#comment-13792698
 ] 

Gabriel Reid commented on CRUNCH-278:
-------------------------------------

Ok, I get it. 

The issue in the API in making it possible to specify the boundary between MR 
job and in-memory is what I was going for with the MaterializedPCollection 
constructor that I posted before (copied here below).

{code}
PTable<ImmutableBytesWritable,Result> htableContents = 
pipeline.read(FromHBase.table());
PTable<A,B> convertedHTable = new 
MaterializedPCollection(htableContents).parallelDo(new DoSomethingFn());
PTable<A,Pair<C,B>> joined = new MapsideJoinStrategy().join(anotherPTable, 
convertedHTable);
{code}

My idea was that everything coming out of the MaterializedPCollection would be 
done in memory, so you could have something that was being calculated upstream 
in the pipeline be read into memory starting from the point where you 
instantiated a MaterializedPCollection.

In any case, yeah, I think it would be pretty important to be able to clearly 
specify which things you want done in MR and which you want done in memory.

> Improvements to MapsideJoin code
> --------------------------------
>
>                 Key: CRUNCH-278
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-278
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, MapReduce Patterns
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-278.patch
>
>
> The fact that we have special-case code in the MapsideJoinStrategy for the 
> in-memory and MR-based Pipeline instances has always bugged me, so I set out 
> to eliminate the distinction between the two impls by creating a new 
> interface, ReadableSourceBundle<T>, that encapsulates the MR and in-memory 
> specific logic for doing mapside joins in order to remove the special-case 
> code in MapsideJoinStrategy and hopefully make other implementations that use 
> our mapside-join patterns much easier to test.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to