Thank you Yanbo,
It looks like this is available in 1.6 version only.
Can you tell me how/when can I download version 1.6?
Thanks and Regards,
Vishnu Viswanath,
On Wed, Dec 2, 2015 at 4:37 AM, Yanbo Liang wrote:
> You can set "handleInvalid" to "skip" which help you skip
You can get 1.6.0-RC1 from
http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
currently, but it's not the last release version.
2015-12-02 23:57 GMT+08:00 Vishnu Viswanath :
> Thank you Yanbo,
>
> It looks like this is available in 1.6 version
Thank you.
On Wed, Dec 2, 2015 at 8:12 PM, Yanbo Liang wrote:
> You can get 1.6.0-RC1 from
> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
> currently, but it's not the last release version.
>
> 2015-12-02 23:57 GMT+08:00 Vishnu Viswanath
Hi Jeff,
I went through the link you provided and I could understand how the fit()
and transform() work.
I tried to use the pipeline in my code and I am getting exception Caused
by: org.apache.spark.SparkException: Unseen label:
The reason for this error as per my understanding is:
For the
Thanks for the reply Yanbo.
I understand that the model will be trained using the indexer map created
during the training stage.
But since I am getting a new set of data during prediction, and I have to
do StringIndexing on the new data also,
Right now I am using a new StringIndexer for this
StringIndexer is an estimator which would train a model to be used both in
training & prediction. So it is consistent between training & prediction.
You may want to read this section of spark ml doc
http://spark.apache.org/docs/latest/ml-guide.html#how-it-works
On Mon, Nov 30, 2015 at 12:52
Thank you Jeff.
On Sun, Nov 29, 2015 at 7:36 PM, Jeff Zhang wrote:
> StringIndexer is an estimator which would train a model to be used both in
> training & prediction. So it is consistent between training & prediction.
>
> You may want to read this section of spark ml doc
>
Hi All,
I have a general question on using StringIndexer.
StringIndexer gives an index to each label in the feature starting from 0 (
0 for least frequent word).
Suppose I am building a model, and I use StringIndexer for transforming on
of my column.
e.g., suppose A was most frequent word