Repository: spark Updated Branches: refs/heads/branch-2.2 211d81beb -> 8acce00ac
[SPARK-22107] Change as to alias in python quickstart ## What changes were proposed in this pull request? Updated docs so that a line of python in the quick start guide executes. Closes #19283 ## How was this patch tested? Existing tests. Author: John O'Leary <[email protected]> Closes #19326 from jgoleary/issues/22107. (cherry picked from commit 20adf9aa1f42353432d356117e655e799ea1290b) Signed-off-by: hyukjinkwon <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8acce00a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8acce00a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8acce00a Branch: refs/heads/branch-2.2 Commit: 8acce00acc343bc04a0f5af4ce4717b42c8938da Parents: 211d81b Author: John O'Leary <[email protected]> Authored: Mon Sep 25 09:16:27 2017 +0900 Committer: hyukjinkwon <[email protected]> Committed: Mon Sep 25 09:16:46 2017 +0900 ---------------------------------------------------------------------- docs/quick-start.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/8acce00a/docs/quick-start.md ---------------------------------------------------------------------- diff --git a/docs/quick-start.md b/docs/quick-start.md index c4c5a5a..aac047f 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -153,7 +153,7 @@ This first maps a line to an integer value and aliases it as "numWords", creatin One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can implement MapReduce flows easily: {% highlight python %} ->>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).as("word")).groupBy("word").count() +>>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).alias("word")).groupBy("word").count() {% endhighlight %} Here, we use the `explode` function in `select`, to transfrom a Dataset of lines to a Dataset of words, and then combine `groupBy` and `count` to compute the per-word counts in the file as a DataFrame of 2 columns: "word" and "count". To collect the word counts in our shell, we can call `collect`: --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
