Repository: spark Updated Branches: refs/heads/master 188b47e68 -> 6eda55f72
Added more information to Imputer Often times we want to impute custom values other than 'NaN'. My addition helps people locate this function without reading the API. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: tengpeng <[email protected]> Closes #19600 from tengpeng/patch-5. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6eda55f7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6eda55f7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6eda55f7 Branch: refs/heads/master Commit: 6eda55f728a6f2e265ae12a7e01dae88e4172715 Parents: 188b47e Author: tengpeng <[email protected]> Authored: Mon Oct 30 07:24:55 2017 +0000 Committer: Sean Owen <[email protected]> Committed: Mon Oct 30 07:24:55 2017 +0000 ---------------------------------------------------------------------- docs/ml-features.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/6eda55f7/docs/ml-features.md ---------------------------------------------------------------------- diff --git a/docs/ml-features.md b/docs/ml-features.md index 86a0e09..7264313 100644 --- a/docs/ml-features.md +++ b/docs/ml-features.md @@ -1373,7 +1373,9 @@ for more details on the API. The `Imputer` transformer completes missing values in a dataset, either using the mean or the median of the columns in which the missing values are located. The input columns should be of `DoubleType` or `FloatType`. Currently `Imputer` does not support categorical features and possibly -creates incorrect values for columns containing categorical features. +creates incorrect values for columns containing categorical features. Imputer can impute custom values +other than 'NaN' by `.setMissingValue(custom_value)`. For example, `.setMissingValue(0)` will impute +all occurrences of (0). **Note** all `null` values in the input columns are treated as missing, and so are also imputed. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
