Ok, sounds very promising... i'll try to start digging on the driver part this week then (Pipeline wrapper in R5).
On Sun, Oct 28, 2012 at 11:56 AM, Josh Wills <[email protected]> wrote: > On Fri, Oct 26, 2012 at 2:40 PM, Dmitriy Lyubimov <[email protected]> wrote: >> Ok, cool. >> >> So what state is Crunch in? I take it is in a fairly advanced state. >> So every api mentioned in the FlumeJava paper is working , right? Or >> there's something that is not working specifically? > > I think the only thing in the paper that we don't have in a working > state is MSCR fusion. It's mostly just a question of prioritizing it > and getting the work done. > >> >> On Fri, Oct 26, 2012 at 2:31 PM, Josh Wills <[email protected]> wrote: >>> Hey Dmitriy, >>> >>> Got a fork going and looking forward to playing with crunchR this weekend-- >>> thanks! >>> >>> J >>> >>> On Wed, Oct 24, 2012 at 1:28 PM, Dmitriy Lyubimov <[email protected]> wrote: >>> >>>> Project template https://github.com/dlyubimov/crunchR >>>> >>>> Default profile does not compile R artifact . R profile compiles R >>>> artifact. for convenience, it is enabled by supplying -DR to mvn >>>> command line, e.g. >>>> >>>> mvn install -DR >>>> >>>> there's also a helper that installs the snapshot version of the >>>> package in the crunchR module. >>>> >>>> There's RJava and JRI java dependencies which i did not find anywhere >>>> in public maven repos; so it is installed into my github maven repo so >>>> far. Should compile for 3rd party. >>>> >>>> -DR compilation requires R, RJava and optionally, RProtoBuf. R Doc >>>> compilation requires roxygen2 (i think). >>>> >>>> For some reason RProtoBuf fails to import into another package, got a >>>> weird exception when i put @import RProtoBuf into crunchR, so >>>> RProtoBuf is now in "Suggests" category. Down the road that may be a >>>> problem though... >>>> >>>> other than the template, not much else has been done so far... finding >>>> hadoop libraries and adding it to the package path on initialization >>>> via "hadoop classpath"... adding Crunch jars and its non-"provided" >>>> transitives to the crunchR's java part... >>>> >>>> No legal stuff... >>>> >>>> No readmes... complete stealth at this point. >>>> >>>> On Thu, Oct 18, 2012 at 12:35 PM, Dmitriy Lyubimov <[email protected]> >>>> wrote: >>>> > Ok, cool. I will try to roll project template by some time next week. >>>> > we can start with prototyping and benchmarking something really >>>> > simple, such as parallelDo(). >>>> > >>>> > My interim goal is to perhaps take some more or less simple algorithm >>>> > from Mahout and demonstrate it can be solved with Rcrunch (or whatever >>>> > name it has to be) in a comparable time (performance) but with much >>>> > fewer lines of code. (say one of factorization or clustering things) >>>> > >>>> > >>>> > On Wed, Oct 17, 2012 at 10:24 PM, Rahul <[email protected]> wrote: >>>> >> I am not much of R user but I am interested to see how well we can >>>> integrate >>>> >> the two. I would be happy to help. >>>> >> >>>> >> regards, >>>> >> Rahul >>>> >> >>>> >> On 18-10-2012 04:04, Josh Wills wrote: >>>> >>> >>>> >>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy Lyubimov <[email protected]> >>>> >>> wrote: >>>> >>>> >>>> >>>> Yep, ok. >>>> >>>> >>>> >>>> I imagine it has to be an R module so I can set up a maven project >>>> >>>> with java/R code tree (I have been doing that a lot lately). Or if you >>>> >>>> have a template to look at, it would be useful i guess too. >>>> >>> >>>> >>> No, please go right ahead. >>>> >>> >>>> >>>> >>>> >>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills <[email protected]> >>>> wrote: >>>> >>>>> >>>> >>>>> I'd like it to be separate at first, but I am happy to help. Github >>>> >>>>> repo? >>>> >>>>> On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov" <[email protected]> >>>> wrote: >>>> >>>>> >>>> >>>>>> Ok maybe there's a benefit to try a JRI/RJava prototype on top of >>>> >>>>>> Crunch for something simple. This should both save time and prove or >>>> >>>>>> disprove if Crunch via RJava integration is viable. >>>> >>>>>> >>>> >>>>>> On my part i can try to do it within Crunch framework or we can keep >>>> >>>>>> it completely separate. >>>> >>>>>> >>>> >>>>>> -d >>>> >>>>>> >>>> >>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh Wills <[email protected]> >>>> >>>>>> wrote: >>>> >>>>>>> >>>> >>>>>>> I am an avid R user and would be into it-- who gave the talk? Was >>>> it >>>> >>>>>>> Murray Stokely? >>>> >>>>>>> >>>> >>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy Lyubimov < >>>> [email protected]> >>>> >>>>>> >>>> >>>>>> wrote: >>>> >>>>>>>> >>>> >>>>>>>> Hello, >>>> >>>>>>>> >>>> >>>>>>>> I was pretty excited to learn of Google's experience of R mapping >>>> of >>>> >>>>>>>> flume java on one of recent BARUGs. I think a lot of applications >>>> >>>>>>>> similar to what we do in Mahout could be prototyped using flume R. >>>> >>>>>>>> >>>> >>>>>>>> I did not quite get the details of Google implementation of R >>>> >>>>>>>> mapping, >>>> >>>>>>>> but i am not sure if just a direct mapping from R to Crunch would >>>> be >>>> >>>>>>>> sufficient (and, for most part, efficient). RJava/JRI and jni >>>> seem to >>>> >>>>>>>> be a pretty terrible performer to do that directly. >>>> >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> on top of it, I am thinknig if this project could have a >>>> contributed >>>> >>>>>>>> adapter to Mahout's distributed matrices, that would be just a >>>> very >>>> >>>>>>>> good synergy. >>>> >>>>>>>> >>>> >>>>>>>> Is there anyone interested in contributing/advising for open >>>> source >>>> >>>>>>>> version of flume R support? Just gauging interest, Crunch list >>>> seems >>>> >>>>>>>> like a natural place to poke. >>>> >>>>>>>> >>>> >>>>>>>> Thanks . >>>> >>>>>>>> >>>> >>>>>>>> -Dmitriy >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> -- >>>> >>>>>>> Director of Data Science >>>> >>>>>>> Cloudera >>>> >>>>>>> Twitter: @josh_wills >>>> >>> >>>> >>> >>>> >>> >>>> >> >>>> >>> >>> >>> >>> -- >>> Director of Data Science >>> Cloudera <http://www.cloudera.com> >>> Twitter: @josh_wills <http://twitter.com/josh_wills>
