I am ready to order a t-shirt with "Go, Andy! +100" accross it if it makes any pragmatical sense. On Apr 13, 2014 11:11 PM, "Sebastian Schelter" <[email protected]> wrote:
> On 04/14/2014 08:00 AM, Dmitriy Lyubimov wrote: > >> not all things unfortunately map gracefully into algebra. But hopefully >> some of the whole can still be. >> > > Yes, that's why I was asking Andy if there are enough constructs. If not, > we might have to add more. > > >> I am even a little bit worried that we may develop almost too much (is >> there such thing) of ML before we have a chance to cyrstallize data frames >> and perhaps dictionary discussions. these are more tools to keep >> abstracted. >> > > I think it's a very good thing to have early ML implementations on the > DSL, because it allows us to validate whether we are on the right path. We > should start with providing the things that are most popular in mahout, > like the item-based recommender from MAHOUT-1464. Having a few > implementations on the DSL also helps with designing new abstractions, > because for every proposed feature we can look at the existing code and see > how helpful the new feature would be. > > >> I just don't want Mahout to be yet-another mllib. I shudder every time >> somebody says "we want to create a Spark version of (an|the) algorithm". >> I >> know it will be creating wrong talking points for somebody anxious to draw >> parallels. >> > > Totally agree here. Looks history repeats itself from "I want to create a > Hadoop implementation" to "I want to create a Spark implementation" :) > > >> >> On Sun, Apr 13, 2014 at 10:51 PM, Sebastian Schelter <[email protected]> >> wrote: >> >> Andy, that would be awesome. Have you had a look at our new scala DSL >>> [1]? >>> Does it offer enough constructs for you to rewrite your implementation >>> with >>> it? >>> >>> --sebastian >>> >>> >>> [1] https://mahout.apache.org/users/sparkbindings/home.html >>> >>> >>> On 04/14/2014 07:47 AM, Andy Twigg wrote: >>> >>> +1 to removing present Random Forests. Andy Twigg had provided a >>>> >>>>> Spark >>>>> based Streaming Random Forests impl sometime last year. Its time to >>>>> restart >>>>> that conversation and integrate that into the codebase if the >>>>> contributor >>>>> is still willing i.e. >>>>> >>>>> >>>> I'm happy to contribute this, but as it stands it's written against >>>> spark, even forgetting the 'streaming' aspect. Do you have any advice >>>> on how to proceed? >>>> >>>> >>>> >>> >> >
