Please consider to use other classification models such as logistic
regression or GBT. Naive bayes usually consider features as count, which is
not suitable to be used on features generated by one-hot encoder.

Thanks
Yanbo

On Wed, May 31, 2017 at 3:58 PM, Amlan Jyoti <amlan.jy...@tcs.com> wrote:

> Hi,
>
> I am trying to run Naive Bayes Model using Spark ML libraries, in Java.
> The sample snippet of dataset is given below:
>
> *Raw Data* -
>
>
> But, as the input data needs to in numeric, so I am using
> *one-hot-encoder* on the Gender field[m->0,1][f->1,0]; and the finally
> the 'features' vector is inputted to Model, and I could get the Output.
>
> *Transformed Data* -
>
>
> But the model *results are not correct *as the 'Gender' field[Originally,
> Categorical] is now considered as a continuous field after one-hot encoding
> transformations.
>
> *Expectation* is that - for 'continuous data', mean and variance ; and
> for 'categorical data', the number of occurrences of different categories,
> is to be calculated. [In, my case, mean and variances are calculated even
> for the Gender Field].
>
> So, is there any way by which I can indicate to the model that a
> particular data field is 'categorical' by nature?
>
> Thanks
>
> Best Regards
> Amlan Jyoti
>
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>

Reply via email to