[ 
https://issues.apache.org/jira/browse/SPARK-16365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368310#comment-15368310
 ] 

Nick Pentreath commented on SPARK-16365:
----------------------------------------

Good question - and part of the reason for getting discussion going here. In 
general (IMO) the short answer is "no" - I think Spark should be the tool for 
training models on moderately large to extremely large datasets, but not 
necessarily for completely general machine learning.

I think the idea behind {{mllib-local}} is potentially two-fold: (i) make it 
easier to use Spark models / pipelines in production scenarios, and (ii) 
enhance linalg primitives available to devs / users.

> Ideas for moving "mllib-local" forward
> --------------------------------------
>
>                 Key: SPARK-16365
>                 URL: https://issues.apache.org/jira/browse/SPARK-16365
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Nick Pentreath
>
> Since SPARK-13944 is all done, we should all think about what the "next 
> steps" might be for {{mllib-local}}. E.g., it could be "improve Spark's 
> linear algebra", or "investigate how we will implement local models/pipelines 
> in Spark", etc.
> This ticket is for comments, ideas, brainstormings and PoCs. The separation 
> of linalg into a standalone project turned out to be significantly more 
> complex than originally expected. So I vote we devote sufficient discussion 
> and time to planning out the next move :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to