Repository: spark
Updated Branches:
refs/heads/master de7806048 -> afb131637
[SPARK-5678] Convert DataFrame to pandas.DataFrame and Series
```
pyspark.sql.DataFrame.to_pandas = to_pandas(self) unbound pyspark.sql.DataFrame
method
Collect all the rows and return a `pandas.DataFrame`.
>>> df.to_pandas() # doctest: +SKIP
age name
0 2 Alice
1 5 Bob
pyspark.sql.Column.to_pandas = to_pandas(self) unbound pyspark.sql.Column method
Return a pandas.Series from the column
>>> df.age.to_pandas() # doctest: +SKIP
0 2
1 5
dtype: int64
```
Not tests by jenkins (they depends on pandas)
Author: Davies Liu <[email protected]>
Closes #4476 from davies/to_pandas and squashes the following commits:
6276fb6 [Davies Liu] Convert DataFrame to pandas.DataFrame and Series
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/afb13163
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/afb13163
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/afb13163
Branch: refs/heads/master
Commit: afb131637d96e1e5e07eb8abf24e32e7f3b2304d
Parents: de78060
Author: Davies Liu <[email protected]>
Authored: Mon Feb 9 11:42:52 2015 -0800
Committer: Reynold Xin <[email protected]>
Committed: Mon Feb 9 11:42:52 2015 -0800
----------------------------------------------------------------------
python/pyspark/sql.py | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/afb13163/python/pyspark/sql.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql.py b/python/pyspark/sql.py
index e55f285..6a6dfbc 100644
--- a/python/pyspark/sql.py
+++ b/python/pyspark/sql.py
@@ -2284,6 +2284,18 @@ class DataFrame(object):
"""
return self.select('*', col.alias(colName))
+ def to_pandas(self):
+ """
+ Collect all the rows and return a `pandas.DataFrame`.
+
+ >>> df.to_pandas() # doctest: +SKIP
+ age name
+ 0 2 Alice
+ 1 5 Bob
+ """
+ import pandas as pd
+ return pd.DataFrame.from_records(self.collect(), columns=self.columns)
+
# Having SchemaRDD for backward compatibility (for docs)
class SchemaRDD(DataFrame):
@@ -2551,6 +2563,19 @@ class Column(DataFrame):
jc = self._jc.cast(jdt)
return Column(jc, self.sql_ctx)
+ def to_pandas(self):
+ """
+ Return a pandas.Series from the column
+
+ >>> df.age.to_pandas() # doctest: +SKIP
+ 0 2
+ 1 5
+ dtype: int64
+ """
+ import pandas as pd
+ data = [c for c, in self.collect()]
+ return pd.Series(data)
+
def _aggregate_func(name, doc=""):
""" Create a function for aggregator by name"""
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]