Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Vishnu Viswanath
Thank you Yanbo, It looks like this is available in 1.6 version only. Can you tell me how/when can I download version 1.6? Thanks and Regards, Vishnu Viswanath, On Wed, Dec 2, 2015 at 4:37 AM, Yanbo Liang wrote: > You can set "handleInvalid" to "skip" which help you skip

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Yanbo Liang
You can get 1.6.0-RC1 from http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ currently, but it's not the last release version. 2015-12-02 23:57 GMT+08:00 Vishnu Viswanath : > Thank you Yanbo, > > It looks like this is available in 1.6 version

Re: General question on using StringIndexer in SparkML

2015-12-02 Thread Vishnu Viswanath
Thank you. On Wed, Dec 2, 2015 at 8:12 PM, Yanbo Liang wrote: > You can get 1.6.0-RC1 from > http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ > currently, but it's not the last release version. > > 2015-12-02 23:57 GMT+08:00 Vishnu Viswanath

Re: General question on using StringIndexer in SparkML

2015-12-01 Thread Vishnu Viswanath
Hi Jeff, I went through the link you provided and I could understand how the fit() and transform() work. I tried to use the pipeline in my code and I am getting exception Caused by: org.apache.spark.SparkException: Unseen label: The reason for this error as per my understanding is: For the

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Vishnu Viswanath
Thanks for the reply Yanbo. I understand that the model will be trained using the indexer map created during the training stage. But since I am getting a new set of data during prediction, and I have to do StringIndexing on the new data also, Right now I am using a new StringIndexer for this

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Jeff Zhang
StringIndexer is an estimator which would train a model to be used both in training & prediction. So it is consistent between training & prediction. You may want to read this section of spark ml doc http://spark.apache.org/docs/latest/ml-guide.html#how-it-works On Mon, Nov 30, 2015 at 12:52

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Vishnu Viswanath
Thank you Jeff. On Sun, Nov 29, 2015 at 7:36 PM, Jeff Zhang wrote: > StringIndexer is an estimator which would train a model to be used both in > training & prediction. So it is consistent between training & prediction. > > You may want to read this section of spark ml doc >

General question on using StringIndexer in SparkML

2015-11-28 Thread Vishnu Viswanath
Hi All, I have a general question on using StringIndexer. StringIndexer gives an index to each label in the feature starting from 0 ( 0 for least frequent word). Suppose I am building a model, and I use StringIndexer for transforming on of my column. e.g., suppose A was most frequent word