[ 
https://issues.apache.org/jira/browse/SPARK-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964405#comment-13964405
 ] 

Xiangrui Meng commented on SPARK-1357:
--------------------------------------

Hi Sean, 

Actually, you came in just in time. This was only the first pass, and we are 
still accepting API visibility/annotation patches during the QA period. MLlib 
is still a beta component of Spark, so "1.0" doesn't mean it is stable. And we 
still accept additions (JIRA submitted before April 1) to MLlib, as Patrick 
announced in the dev mailing list.

(I do want to mark all of MLlib experimental to reserve the right to change in 
the future, but we need to find a balance point here.)

I agree that it is future-proof to switch id type from Int to Long in ALS. The 
extra storage requirement is 8 bytes per rating. Inside ALS, we also 
re-partition the ratings, which needs extra storage. We need to consider 
whether we want to switch to Long completely or provide an option to use Long 
ids. Could you submit a patch, either marking ALS experimental or allowing 
using Long ids?

I don't think String type is necessary because we can alway creates a map 
between String ids and Long ids. A String id usually costs more than a Long id. 
For the same reason, classification uses Double for labels.

Please submit a patch for APIs you don't feel comfortable to say "stable" or 
marked "experimental/developer" by me but you think the other way. It would be 
great to keep the discussion going. Thanks!

Best,
Xiangrui

> [MLLIB] Annotate developer and experimental API's
> -------------------------------------------------
>
>                 Key: SPARK-1357
>                 URL: https://issues.apache.org/jira/browse/SPARK-1357
>             Project: Spark
>          Issue Type: Sub-task
>          Components: MLlib
>    Affects Versions: 1.0.0
>            Reporter: Patrick Wendell
>            Assignee: Xiangrui Meng
>             Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to