[ 
https://issues.apache.org/jira/browse/SPARK-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231308#comment-15231308
 ] 

DB Tsai commented on SPARK-13944:
---------------------------------

For production use case, it's not desirable to include the whole Spark stack to 
use the linear algebra library or even models in Spark mllib, and a lot of 
time, those implementation can be standalone without depending on Spark 
platform. Due to the current mllib depending on Spark platform, if one wants to 
use it in production, it often causes jar conflict, and people end up 
reimplementing for production again. 

The goal for this PR is only separate our the local linear algebra out from 
mllib, and set up a build that we can provide the mllib-local jar. The long 
term goal will be gradually moving the platform independent code out from mllib 
to mllib-local, so people can easily use them in their production apps. 

> Separate out local linear algebra as a standalone module without Spark 
> dependency
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-13944
>                 URL: https://issues.apache.org/jira/browse/SPARK-13944
>             Project: Spark
>          Issue Type: New Feature
>          Components: Build, ML
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>            Assignee: DB Tsai
>            Priority: Blocker
>
> Separate out linear algebra as a standalone module without Spark dependency 
> to simplify production deployment. We can call the new module 
> spark-mllib-local, which might contain local models in the future.
> The major issue is to remove dependencies on user-defined types.
> The package name will be changed from mllib to ml. For example, Vector will 
> be changed from `org.apache.spark.mllib.linalg.Vector` to 
> `org.apache.spark.ml.linalg.Vector`. The return vector type in the new ML 
> pipeline will be the one in ML package; however, the existing mllib code will 
> not be touched. As a result, this will potentially break the API. Also, when 
> the vector is loaded from mllib vector by Spark SQL, the vector will 
> automatically converted into the one in ml package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to