[
https://issues.apache.org/jira/browse/SPARK-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964405#comment-13964405
]
Xiangrui Meng commented on SPARK-1357:
--------------------------------------
Hi Sean,
Actually, you came in just in time. This was only the first pass, and we are
still accepting API visibility/annotation patches during the QA period. MLlib
is still a beta component of Spark, so "1.0" doesn't mean it is stable. And we
still accept additions (JIRA submitted before April 1) to MLlib, as Patrick
announced in the dev mailing list.
(I do want to mark all of MLlib experimental to reserve the right to change in
the future, but we need to find a balance point here.)
I agree that it is future-proof to switch id type from Int to Long in ALS. The
extra storage requirement is 8 bytes per rating. Inside ALS, we also
re-partition the ratings, which needs extra storage. We need to consider
whether we want to switch to Long completely or provide an option to use Long
ids. Could you submit a patch, either marking ALS experimental or allowing
using Long ids?
I don't think String type is necessary because we can alway creates a map
between String ids and Long ids. A String id usually costs more than a Long id.
For the same reason, classification uses Double for labels.
Please submit a patch for APIs you don't feel comfortable to say "stable" or
marked "experimental/developer" by me but you think the other way. It would be
great to keep the discussion going. Thanks!
Best,
Xiangrui
> [MLLIB] Annotate developer and experimental API's
> -------------------------------------------------
>
> Key: SPARK-1357
> URL: https://issues.apache.org/jira/browse/SPARK-1357
> Project: Spark
> Issue Type: Sub-task
> Components: MLlib
> Affects Versions: 1.0.0
> Reporter: Patrick Wendell
> Assignee: Xiangrui Meng
> Fix For: 1.0.0
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)