Maybe it is easier to build what you're looking for by contributing to
plyrmr:
https://github.com/RevolutionAnalytics/plyrmr
It already implements "plyr for Hadoop" on top or the rmr2 package. Not
sure whether merging is already implemented, but using rmr2 it should
not be prohibitively difficult (hopefully).
Best,
M
On 12/03/2014 11:47 AM, Mike.Gahan wrote:
Hello all,
I absolutely love the rolling join capabilities of data.table. It is
extremely useful for the work I do. However, sometimes I work with data that
is too large to fit into RAM (even when using a large server). I want to
implement this rolling join code in a Java Map Reduce setting to be able to
leverage some of the other resources available at the company I work for.
Unfortunately I am not an experienced Java programmer. I figured that a
project like this would provide an excellent incentive to learn this skill.
My question is this: what data.table current code for rolling joins would be
most useful to reference in starting this project? I am guessing the
bmerge.c code
<https://github.com/Rdatatable/data.table/blob/master/src/bmerge.c> has
much of what I want. Any other code in the data.table package I should be
aware of? Any other advice that might make this process go more smoothly? I
know the function is based on a Modified Binary Search algorithm. Are there
any libraries anyone is aware of that might help this along?
I really appreciate all help.
Mike
--
View this message in context:
http://r.789695.n4.nabble.com/Rolling-Joins-Replicated-in-Java-MapReduce-tp4700329.html
Sent from the datatable-help mailing list archive at Nabble.com.
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help