[
https://issues.apache.org/jira/browse/SPARK-16365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368310#comment-15368310
]
Nick Pentreath commented on SPARK-16365:
----------------------------------------
Good question - and part of the reason for getting discussion going here. In
general (IMO) the short answer is "no" - I think Spark should be the tool for
training models on moderately large to extremely large datasets, but not
necessarily for completely general machine learning.
I think the idea behind {{mllib-local}} is potentially two-fold: (i) make it
easier to use Spark models / pipelines in production scenarios, and (ii)
enhance linalg primitives available to devs / users.
> Ideas for moving "mllib-local" forward
> --------------------------------------
>
> Key: SPARK-16365
> URL: https://issues.apache.org/jira/browse/SPARK-16365
> Project: Spark
> Issue Type: Brainstorming
> Components: ML
> Reporter: Nick Pentreath
>
> Since SPARK-13944 is all done, we should all think about what the "next
> steps" might be for {{mllib-local}}. E.g., it could be "improve Spark's
> linear algebra", or "investigate how we will implement local models/pipelines
> in Spark", etc.
> This ticket is for comments, ideas, brainstormings and PoCs. The separation
> of linalg into a standalone project turned out to be significantly more
> complex than originally expected. So I vote we devote sufficient discussion
> and time to planning out the next move :)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]