This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-4.0 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.0 by this push: new da5a501cbccd [MINOR][PS][DOC] Update pandas API on Spark option doc da5a501cbccd is described below commit da5a501cbccd9872c7b65054ad9ed26d9683e103 Author: Takuya Ueshin <ues...@databricks.com> AuthorDate: Sun May 4 10:35:28 2025 +0900 [MINOR][PS][DOC] Update pandas API on Spark option doc Updates pandas API on Spark option doc. The descriptions for some options are outdated. No. The existing tests should pass. No. Closes #50777 from ueshin/doc. Authored-by: Takuya Ueshin <ues...@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> (cherry picked from commit e857f43cde5a00acc36d58a29eaa3cb5593161ef) Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- python/docs/source/tutorial/pandas_on_spark/options.rst | 15 ++++++++------- python/pyspark/pandas/config.py | 2 +- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/python/docs/source/tutorial/pandas_on_spark/options.rst b/python/docs/source/tutorial/pandas_on_spark/options.rst index 14164b771e3f..91f128eb351a 100644 --- a/python/docs/source/tutorial/pandas_on_spark/options.rst +++ b/python/docs/source/tutorial/pandas_on_spark/options.rst @@ -274,11 +274,11 @@ compute.max_rows 1000 'compute.max_rows' sets is unset, the operation is executed by PySpark. Default is 1000. compute.shortcut_limit 1000 'compute.shortcut_limit' sets the limit for a - shortcut. It computes specified number of rows and - use its schema. When the dataframe length is larger - than this limit, pandas-on-Spark uses PySpark to - compute. -compute.ops_on_diff_frames False This determines whether or not to operate between two + shortcut. It computes the specified number of rows + and uses its schema. When the dataframe length is + larger than this limit, pandas-on-Spark uses PySpark + to compute. +compute.ops_on_diff_frames True This determines whether or not to operate between two different dataframes. For example, 'combine_frames' function internally performs a join operation which can be expensive in general. So, if @@ -325,8 +325,9 @@ plotting.max_rows 1000 'plotting.max_rows' sets used for plotting. Default is 1000. plotting.sample_ratio None 'plotting.sample_ratio' sets the proportion of data that will be plotted for sample-based plots such as - `plot.line` and `plot.area`. This option defaults to - 'plotting.max_rows' option. + `plot.line` and `plot.area`. If not set, it is + derived from 'plotting.max_rows', by calculating the + ratio of 'plotting.max_rows' to the total data size. plotting.backend 'plotly' Backend to use for plotting. Default is plotly. Supports any package that has a top-level `.plot` method. Known options are: [matplotlib, plotly]. diff --git a/python/pyspark/pandas/config.py b/python/pyspark/pandas/config.py index 6ed4adf21ff4..64fbd006570e 100644 --- a/python/pyspark/pandas/config.py +++ b/python/pyspark/pandas/config.py @@ -112,7 +112,7 @@ class Option: # # NOTE: if you are fixing or adding an option here, make sure you execute `show_options()` and # copy & paste the results into show_options -# 'docs/source/user_guide/pandas_on_spark/options.rst' as well. +# 'python/docs/source/tutorial/pandas_on_spark/options.rst' as well. # See the examples below: # >>> from pyspark.pandas.config import show_options # >>> show_options() --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org