This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new c483e29 [SPARK-38487][PYTHON][DOC] Fix docstrings of
nlargest/nsmallest of DataFrame
c483e29 is described below
commit c483e2977cbc6ae33d999c9c9d1dbacd9c53d85a
Author: Xinrong Meng <[email protected]>
AuthorDate: Thu Mar 10 15:32:48 2022 +0900
[SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame
### What changes were proposed in this pull request?
Fix docstrings of nlargest/nsmallest of DataFrame
### Why are the changes needed?
To make docstring less confusing.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test.
Closes #35793 from xinrong-databricks/frame.ntop.
Authored-by: Xinrong Meng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/pandas/frame.py | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index d4803eb..64a6471 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -7283,7 +7283,7 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
)
return internal
- # TODO: add keep = First
+ # TODO: add keep = First
def nlargest(self, n: int, columns: Union[Name, List[Name]]) ->
"DataFrame":
"""
Return the first `n` rows ordered by `columns` in descending order.
@@ -7340,7 +7340,7 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
6 NaN 12
In the following example, we will use ``nlargest`` to select the three
- rows having the largest values in column "population".
+ rows having the largest values in column "X".
>>> df.nlargest(n=3, columns='X')
X Y
@@ -7348,12 +7348,14 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
4 6.0 10
3 5.0 9
+ To order by the largest values in column "Y" and then "X", we can
+ specify multiple columns like in the next example.
+
>>> df.nlargest(n=3, columns=['Y', 'X'])
X Y
6 NaN 12
5 7.0 11
4 6.0 10
-
"""
return self.sort_values(by=columns, ascending=False).head(n=n)
@@ -7403,7 +7405,7 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
6 NaN 12
In the following example, we will use ``nsmallest`` to select the
- three rows having the smallest values in column "a".
+ three rows having the smallest values in column "X".
>>> df.nsmallest(n=3, columns='X') # doctest: +NORMALIZE_WHITESPACE
X Y
@@ -7411,7 +7413,7 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
1 2.0 7
2 3.0 8
- To order by the largest values in column "a" and then "c", we can
+ To order by the smallest values in column "Y" and then "X", we can
specify multiple columns like in the next example.
>>> df.nsmallest(n=3, columns=['Y', 'X']) # doctest:
+NORMALIZE_WHITESPACE
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]