Yes, discarding is the best option which is currently supported.

On Wed, Jun 3, 2015 at 12:10 PM, Maheshakya Wijewardena <[email protected]
> wrote:

> Sparks' Decision tree does not accept datasets with a single value in a
> feature. It produces the following error:
>
>> requirement failed: DecisionTree Strategy given invalid
>> categoricalFeaturesInfo setting: feature 645 has 1 categories.  The number
>> of categories should be >= 2
>>
>
> This is not an uncommon scenario since large datasets can contain features
> with only a single value (See training data in [1] for example). As this is
> a Spark error, there should be a way to handle such datasets externally.
>
> One possible solution is to allow user to discard features(columns), so
> that they can discard those features with single values before training a
> Decision tree. Please suggest if there are any other feasible solutions.
>
> Best regards,
>
> [1] https://www.kaggle.com/c/digit-recognizer
> --
> Pruthuvi Maheshakya Wijewardena
> Software Engineer
> WSO2 Lanka (Pvt) Ltd
> Email: [email protected]
> Mobile: +94711228855
>
>
>


-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to