[ 
https://issues.apache.org/jira/browse/SPARK-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616154#comment-14616154
 ] 

Joseph K. Bradley commented on SPARK-3164:
------------------------------------------

[~rekhajoshm] My apologies!  I should have updated this JIRA before; it was 
fixed already for spark.ml, and we will not be able to fix it for spark.mllib 
(because of the need to maintain API stability).  I'll close this JIRA.

In the future, please comment on a JIRA before beginning work to check for 
updates and to notify others working on it.

Thank you!

> Store DecisionTree Split.categories as Set
> ------------------------------------------
>
>                 Key: SPARK-3164
>                 URL: https://issues.apache.org/jira/browse/SPARK-3164
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Priority: Trivial
>
> Improvement: computation
> For categorical features with many categories, it could be more efficient to 
> store Split.categories as a Set, not a List.  (It is currently a List.)  A 
> Set might be more scalable (for log n lookups), though tests would need to be 
> done to ensure that Sets do not incur too much more overhead than Lists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to