[
https://issues.apache.org/jira/browse/SPARK-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616154#comment-14616154
]
Joseph K. Bradley commented on SPARK-3164:
------------------------------------------
[~rekhajoshm] My apologies! I should have updated this JIRA before; it was
fixed already for spark.ml, and we will not be able to fix it for spark.mllib
(because of the need to maintain API stability). I'll close this JIRA.
In the future, please comment on a JIRA before beginning work to check for
updates and to notify others working on it.
Thank you!
> Store DecisionTree Split.categories as Set
> ------------------------------------------
>
> Key: SPARK-3164
> URL: https://issues.apache.org/jira/browse/SPARK-3164
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: Joseph K. Bradley
> Priority: Trivial
>
> Improvement: computation
> For categorical features with many categories, it could be more efficient to
> store Split.categories as a Set, not a List. (It is currently a List.) A
> Set might be more scalable (for log n lookups), though tests would need to be
> done to ensure that Sets do not incur too much more overhead than Lists.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]