That sounds useful.  Would you mind creating a JIRA for it?  Thanks!
Joseph

On Mon, Apr 11, 2016 at 2:06 AM, Rahul Tanwani <tanwanira...@gmail.com>
wrote:

> Hi,
>
> Currently the RandomForest algo takes a single maxBins value to decide the
> number of splits to take. This sometimes causes training time to go very
> high when there is a single categorical column having sufficiently large
> number of unique values. This single column impacts all the numeric
> (continuous) columns even though such a high number of splits are not
> required.
>
> Encoding the  categorical column into features make the data very wide and
> this requires us to increase the maxMemoryInMB and puts more pressure on
> the
> GC as well.
>
> Keeping the separate maxBins values for categorial and continuous features
> should be useful in this regard.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Different-maxBins-value-for-categorical-and-continuous-features-in-RandomForest-implementation-tp17099.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to