This is an automated email from the ASF dual-hosted git repository.
ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 548d8a9a157 [SPARK-44728][PYTHON][DOCS] Add examples to approxQuantile
docstring
548d8a9a157 is described below
commit 548d8a9a1572d49a49f52d073ff752efcf56b9ef
Author: Michael Zhang <[email protected]>
AuthorDate: Wed Aug 30 10:44:42 2023 +0800
[SPARK-44728][PYTHON][DOCS] Add examples to approxQuantile docstring
### What changes were proposed in this pull request?
Added examples to `DataFrame.approxQuantile`. Also updated the expected
input description to reflect input types.
### Why are the changes needed?
To improve pyspark documentation
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
DocTest.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #42637 from michaelzhan-db/approxQuantile-doc.
Lead-authored-by: Michael Zhang <[email protected]>
Co-authored-by: michaelzhan-db <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
python/pyspark/sql/dataframe.py | 40 +++++++++++++++++++++++++++++++++++++---
1 file changed, 37 insertions(+), 3 deletions(-)
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 2b0afabf854..1d48e14b420 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -4831,10 +4831,10 @@ class DataFrame(PandasMapOpsMixin,
PandasConversionMixin):
.. versionchanged:: 2.2.0
Added support for multiple columns.
- probabilities : list or tuple
+ probabilities : list or tuple of floats
a list of quantile probabilities
- Each number must belong to [0, 1].
- For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
+ Each number must be a float in the range [0, 1].
+ For example 0.0 is the minimum, 0.5 is the median, 1.0 is the
maximum.
relativeError : float
The relative target precision to achieve
(>= 0). If set to zero, the exact quantiles are computed, which
@@ -4856,6 +4856,40 @@ class DataFrame(PandasMapOpsMixin,
PandasConversionMixin):
-----
Null values will be ignored in numerical columns before calculation.
For columns only containing null values, an empty list is returned.
+
+ Examples
+ --------
+ Example 1: Calculating quantiles for a single column
+
+ >>> data = [(1,), (2,), (3,), (4,), (5,)]
+ >>> df = spark.createDataFrame(data, ["values"])
+ >>> quantiles = df.approxQuantile("values", [0.0, 0.5, 1.0], 0.05)
+ >>> quantiles
+ [1.0, 3.0, 5.0]
+
+ Example 2: Calculating quantiles for multiple columns
+
+ >>> data = [(1, 10), (2, 20), (3, 30), (4, 40), (5, 50)]
+ >>> df = spark.createDataFrame(data, ["col1", "col2"])
+ >>> quantiles = df.approxQuantile(["col1", "col2"], [0.0, 0.5, 1.0],
0.05)
+ >>> quantiles
+ [[1.0, 3.0, 5.0], [10.0, 30.0, 50.0]]
+
+ Example 3: Handling null values
+
+ >>> data = [(1,), (None,), (3,), (4,), (None,)]
+ >>> df = spark.createDataFrame(data, ["values"])
+ >>> quantiles = df.approxQuantile("values", [0.0, 0.5, 1.0], 0.05)
+ >>> quantiles
+ [1.0, 3.0, 4.0]
+
+ Example 4: Calculating quantiles with low precision
+
+ >>> data = [(1,), (2,), (3,), (4,), (5,)]
+ >>> df = spark.createDataFrame(data, ["values"])
+ >>> quantiles = df.approxQuantile("values", [0.0, 0.2, 1.0], 0.1)
+ >>> quantiles
+ [1.0, 1.0, 5.0]
"""
if not isinstance(col, (str, list, tuple)):
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]