You may want to look into Spark SQL. There is currently discussion on adding support for range joins <https://github.com/apache/spark/pull/2939>, which I think are similar to rolling joins in data.table.
I started looking into rmr2, but Hive and Spark SQL look like better options for my use cases. On Wed, Dec 3, 2014 at 6:00 AM, < [email protected]> wrote: > Send datatable-help mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of datatable-help digest..." > > Today's Topics: > > 1. Rolling Joins Replicated in Java MapReduce (Mike.Gahan) > 2. Re: Rolling Joins Replicated in Java MapReduce (Michael Smith) > > > ---------- Forwarded message ---------- > From: "Mike.Gahan" <[email protected]> > To: [email protected] > Cc: > Date: Tue, 2 Dec 2014 19:47:38 -0800 (PST) > Subject: [datatable-help] Rolling Joins Replicated in Java MapReduce > Hello all, > > I absolutely love the rolling join capabilities of data.table. It is > extremely useful for the work I do. However, sometimes I work with data > that > is too large to fit into RAM (even when using a large server). I want to > implement this rolling join code in a Java Map Reduce setting to be able to > leverage some of the other resources available at the company I work for. > Unfortunately I am not an experienced Java programmer. I figured that a > project like this would provide an excellent incentive to learn this skill. > > My question is this: what data.table current code for rolling joins would > be > most useful to reference in starting this project? I am guessing the > bmerge.c code > <https://github.com/Rdatatable/data.table/blob/master/src/bmerge.c> has > much of what I want. Any other code in the data.table package I should be > aware of? Any other advice that might make this process go more smoothly? I > know the function is based on a Modified Binary Search algorithm. Are there > any libraries anyone is aware of that might help this along? > > I really appreciate all help. > Mike > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Rolling-Joins-Replicated-in-Java-MapReduce-tp4700329.html > Sent from the datatable-help mailing list archive at Nabble.com. > > > > ---------- Forwarded message ---------- > From: Michael Smith <[email protected]> > To: "Mike.Gahan" <[email protected]>, > [email protected] > Cc: > Date: Wed, 03 Dec 2014 14:44:11 +0800 > Subject: Re: [datatable-help] Rolling Joins Replicated in Java MapReduce > Maybe it is easier to build what you're looking for by contributing to > plyrmr: > > https://github.com/RevolutionAnalytics/plyrmr > > It already implements "plyr for Hadoop" on top or the rmr2 package. Not > sure whether merging is already implemented, but using rmr2 it should not > be prohibitively difficult (hopefully). > > Best, > M > > > On 12/03/2014 11:47 AM, Mike.Gahan wrote: > >> Hello all, >> >> I absolutely love the rolling join capabilities of data.table. It is >> extremely useful for the work I do. However, sometimes I work with data >> that >> is too large to fit into RAM (even when using a large server). I want to >> implement this rolling join code in a Java Map Reduce setting to be able >> to >> leverage some of the other resources available at the company I work for. >> Unfortunately I am not an experienced Java programmer. I figured that a >> project like this would provide an excellent incentive to learn this >> skill. >> >> My question is this: what data.table current code for rolling joins would >> be >> most useful to reference in starting this project? I am guessing the >> bmerge.c code >> <https://github.com/Rdatatable/data.table/blob/master/src/bmerge.c> has >> much of what I want. Any other code in the data.table package I should be >> aware of? Any other advice that might make this process go more smoothly? >> I >> know the function is based on a Modified Binary Search algorithm. Are >> there >> any libraries anyone is aware of that might help this along? >> >> I really appreciate all help. >> Mike >> >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/ >> Rolling-Joins-Replicated-in-Java-MapReduce-tp4700329.html >> Sent from the datatable-help mailing list archive at Nabble.com. >> _______________________________________________ >> datatable-help mailing list >> [email protected] >> https://lists.r-forge.r-project.org/cgi-bin/mailman/ >> listinfo/datatable-help >> >> > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
