[
https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792250#comment-13792250
]
Micah Whitacre commented on CRUNCH-278:
---------------------------------------
{quote}
For example, think of being able to do mapside joins against (say) an HBase
table, where you could construct the PTable of key-value pairs that is loaded
in memory by reading the table into the client and then doing some processing
on those values inside of the map initialization vs. having to run a MR job to
process that data into a file as a pre-processing step to running the job. I'm
not sure if that's the sort of thing folks would be interested in doing, but it
seemed cool to me.
{quote}
Did someone give you a copy of our code? :) We don't do the Mapside portion
but have a number of use cases where that data should be small enough we should
be able to do it mapside. Additionally our APIs are written in the form of
PTable<Avro,Avro> so we usually have transformed PTable<ImmutableBytesWritable,
Result> from HBase into PTable<Avro,Avro> using simple MapFn's before we would
want to do the joins.
I need to review the ReadableSourceBundle still but just wanted to confirm that
the use case you were heading towards would definitely get used.
> Improvements to MapsideJoin code
> --------------------------------
>
> Key: CRUNCH-278
> URL: https://issues.apache.org/jira/browse/CRUNCH-278
> Project: Crunch
> Issue Type: Bug
> Components: Core, MapReduce Patterns
> Reporter: Josh Wills
> Assignee: Josh Wills
> Attachments: CRUNCH-278.patch
>
>
> The fact that we have special-case code in the MapsideJoinStrategy for the
> in-memory and MR-based Pipeline instances has always bugged me, so I set out
> to eliminate the distinction between the two impls by creating a new
> interface, ReadableSourceBundle<T>, that encapsulates the MR and in-memory
> specific logic for doing mapside joins in order to remove the special-case
> code in MapsideJoinStrategy and hopefully make other implementations that use
> our mapside-join patterns much easier to test.
--
This message was sent by Atlassian JIRA
(v6.1#6144)