(spark) branch master updated: [SPARK-46016][DOCS][PS] Fix pandas API support list properly

gurwls223 Fri, 24 Nov 2023 02:41:01 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 132bb63a897 [SPARK-46016][DOCS][PS] Fix pandas API support list 
properly
132bb63a897 is described below

commit 132bb63a897f4f4049f34deefc065ed3eac6a90f
Author: Haejoon Lee <[email protected]>
AuthorDate: Fri Nov 24 19:38:31 2023 +0900

    [SPARK-46016][DOCS][PS] Fix pandas API support list properly
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to fix a critical issue in the [Supported pandas API 
documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/supported_pandas_api.html)
 where many essential APIs such as `DataFrame.max`, `DataFrame.min`, 
`DataFrame.mean`, `and DataFrame.median`, etc. were incorrectly marked as not 
implemented - marked as "N" - as below:
    
    <img width="291" alt="Screenshot 2023-11-24 at 12 37 49 PM" 
src="https://github.com/apache/spark/assets/44108233/95c5785c-711c-400c-b2ec-0db034e90fd8";>
    
     The root cause of this issue was that the script used to generate the 
support list excluded functions inherited from parent classes. For instance, 
`CategoricalIndex.max` is actually supported by inheriting the `Index` class 
but was not directly implemented in `CategoricalIndex`, leading to it being 
marked as unsupported:
    
    <img width="397" alt="Screenshot 2023-11-24 at 12 30 08 PM" 
src="https://github.com/apache/spark/assets/44108233/90e92996-a88a-4a20-bb0c-4909097e2688";>
    
    ### Why are the changes needed?
    
    The current documentation inaccurately represents the state of supported 
pandas API, which could significantly hinder user experience and adoption. By 
correcting these inaccuracies, we ensure that the documentation reflects the 
true capabilities of Pandas API on Spark, providing users with reliable and 
accurate information.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. This PR only updates the documentation to accurately reflect the 
current state of supported pandas API.
    
    ### How was this patch tested?
    
    Manually build documentation, and check if the supported pandas API list is 
correctly generated as below:
    
    <img width="299" alt="Screenshot 2023-11-24 at 12 36 31 PM" 
src="https://github.com/apache/spark/assets/44108233/a2da0f0b-0973-45cb-b22d-9582bbeb51b5";>
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #43996 from itholic/fix_supported_api_gen.
    
    Authored-by: Haejoon Lee <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/pandas/supported_api_gen.py | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/python/pyspark/pandas/supported_api_gen.py 
b/python/pyspark/pandas/supported_api_gen.py
index a83731db8fc..27d5cd4b37f 100644
--- a/python/pyspark/pandas/supported_api_gen.py
+++ b/python/pyspark/pandas/supported_api_gen.py
@@ -138,23 +138,11 @@ def _create_supported_by_module(
         # module not implemented
         return {}
 
-    pd_funcs = dict(
-        [
-            m
-            for m in getmembers(pd_module, isfunction)
-            if not m[0].startswith("_") and m[0] in pd_module.__dict__
-        ]
-    )
+    pd_funcs = dict([m for m in getmembers(pd_module, isfunction) if not 
m[0].startswith("_")])
     if not pd_funcs:
         return {}
 
-    ps_funcs = dict(
-        [
-            m
-            for m in getmembers(ps_module, isfunction)
-            if not m[0].startswith("_") and m[0] in ps_module.__dict__
-        ]
-    )
+    ps_funcs = dict([m for m in getmembers(ps_module, isfunction) if not 
m[0].startswith("_")])
 
     return _organize_by_implementation_status(
         module_name, pd_funcs, ps_funcs, pd_module_group, ps_module_group


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-46016][DOCS][PS] Fix pandas API support list properly

Reply via email to