Repository: spark
Updated Branches:
  refs/heads/branch-1.6 739d992f0 -> 393f4ba15


[DOCUMENTATION] fixed groupby aggregation example for pyspark

## What changes were proposed in this pull request?

fixing documentation for the groupby/agg example in python

## How was this patch tested?

the existing example in the documentation dose not contain valid syntax 
(missing parenthesis) and is not using `Column` in the expression for `agg()`

after the fix here's how I tested it:

```
In [1]: from pyspark.sql import Row

In [2]: import pyspark.sql.functions as func

In [3]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:records = [{'age': 19, 'department': 1, 'expense': 100},
: {'age': 20, 'department': 1, 'expense': 200},
: {'age': 21, 'department': 2, 'expense': 300},
: {'age': 22, 'department': 2, 'expense': 300},
: {'age': 23, 'department': 3, 'expense': 300}]
:--

In [4]: df = sqlContext.createDataFrame([Row(**d) for d in records])

In [5]: df.groupBy("department").agg(df["department"], func.max("age"), 
func.sum("expense")).show()

+----------+----------+--------+------------+
|department|department|max(age)|sum(expense)|
+----------+----------+--------+------------+
|         1|         1|      20|         300|
|         2|         2|      22|         600|
|         3|         3|      23|         300|
+----------+----------+--------+------------+

Author: Mortada Mehyar <mortada.meh...@gmail.com>

Closes #13587 from mortada/groupby_agg_doc_fix.

(cherry picked from commit 675a73715d3c8adb9d9a9dce5f76a2db5106790c)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/393f4ba1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/393f4ba1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/393f4ba1

Branch: refs/heads/branch-1.6
Commit: 393f4ba1516af47388e72310aee8dbbea9652134
Parents: 739d992
Author: Mortada Mehyar <mortada.meh...@gmail.com>
Authored: Fri Jun 10 00:23:34 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Fri Jun 10 00:23:49 2016 -0700

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/393f4ba1/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 803701e..26511b5 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -2248,7 +2248,7 @@ import pyspark.sql.functions as func
 
 # In 1.3.x, in order for the grouping column "department" to show up,
 # it must be included explicitly as part of the agg function call.
-df.groupBy("department").agg("department"), func.max("age"), 
func.sum("expense"))
+df.groupBy("department").agg(df["department"], func.max("age"), 
func.sum("expense"))
 
 # In 1.4+, grouping column "department" is included automatically.
 df.groupBy("department").agg(func.max("age"), func.sum("expense"))


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to