Hello all, I absolutely love the rolling join capabilities of data.table. It is extremely useful for the work I do. However, sometimes I work with data that is too large to fit into RAM (even when using a large server). I want to implement this rolling join code in a Java Map Reduce setting to be able to leverage some of the other resources available at the company I work for. Unfortunately I am not an experienced Java programmer. I figured that a project like this would provide an excellent incentive to learn this skill.
My question is this: what data.table current code for rolling joins would be most useful to reference in starting this project? I am guessing the bmerge.c code <https://github.com/Rdatatable/data.table/blob/master/src/bmerge.c> has much of what I want. Any other code in the data.table package I should be aware of? Any other advice that might make this process go more smoothly? I know the function is based on a Modified Binary Search algorithm. Are there any libraries anyone is aware of that might help this along? I really appreciate all help. Mike -- View this message in context: http://r.789695.n4.nabble.com/Rolling-Joins-Replicated-in-Java-MapReduce-tp4700329.html Sent from the datatable-help mailing list archive at Nabble.com. _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
