This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 400db88d00e5 [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing
PySpark documentation
400db88d00e5 is described below
commit 400db88d00e50750513d733be697b6b2dd9043d3
Author: Haejoon Lee <[email protected]>
AuthorDate: Mon Nov 27 08:49:18 2023 +0900
[SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation
### What changes were proposed in this pull request?
This PR proposes to enhance the PySpark documentation by leveraging modern
Sphinx features and functionalities. The primary objective is to improve the
overall user experience and readability of the documentation. To achieve this,
the PR includes an upgrade of `Sphinx` and `Jinja2` to their newer/latest
versions, enabling us to use the latest `pydata_sphinx_theme` features such as
light/dark mode toggling.
### Why are the changes needed?
Currently, the PySpark documentation is unable to utilize many of the
advanced features available in recent `Sphinx` versions due to older package
versions. This limitation hinders the documentation's visual appeal and
usability, particularly when compared to other projects like Pandas which have
already adopted these enhancements. For example:
## Pandas API reference (better layout / switching light & dark mode
available)
### Dark mode
<img width="1409" alt="Screenshot 2023-11-26 at 5 43 29 AM"
src="https://github.com/apache/spark/assets/44108233/0f97ce4a-c1ec-47fb-9295-445c2d557393">
### Light mode
<img width="1403" alt="Screenshot 2023-11-26 at 5 45 01 AM"
src="https://github.com/apache/spark/assets/44108233/715f74a8-9e49-4c05-80ef-5531d2e68220">
## PySpark API reference (less readable compare to pandas / no light & dark
mode)
<img width="1312" alt="Screenshot 2023-11-26 at 5 43 48 AM"
src="https://github.com/apache/spark/assets/44108233/722d2b61-e231-4387-a5ab-dcd447045d94">
By updating the `Sphinx` and `Jinja2` versions, we can significantly
improve the documentation's layout, design, and interactive features, thereby
enhancing the end-user experience.
### Does this PR introduce _any_ user-facing change?
No API changes, but users will notice a more modern and user-friendly
interface in the PySpark documentation. New features like light/dark mode and
improved page layouts will be available as below:
## Before
<img width="1312" alt="Screenshot 2023-11-26 at 5 43 48 AM"
src="https://github.com/apache/spark/assets/44108233/722d2b61-e231-4387-a5ab-dcd447045d94">
## After
### Dark mode
<img width="1388" alt="Screenshot 2023-11-26 at 6 17 13 AM"
src="https://github.com/apache/spark/assets/44108233/b5ed6cfd-9a65-4c03-a067-b40e89cc8c48">
### Light mode
<img width="1392" alt="Screenshot 2023-11-26 at 6 16 47 AM"
src="https://github.com/apache/spark/assets/44108233/24b723a7-5b00-4565-81d9-9c87154c115f">
### How was this patch tested?
Manually built docs from local environment, and also tested combinations
between various `Jinja2`, `Sphinx` and `pydata_sphinx_theme` versions for best
document rendering.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #44012 from itholic/upgrade_sphinx.
Authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
.github/workflows/build_and_test.yml | 2 +-
dev/requirements.txt | 6 +--
python/docs/source/_static/spark-logo-dark.png | Bin 0 -> 23555 bytes
python/docs/source/_static/spark-logo-light.png | Bin 0 -> 18773 bytes
.../_templates/autosummary/accessor_attribute.rst | 6 +++
.../_templates/autosummary/accessor_method.rst | 6 +++
.../_templates/autosummary/class_with_docs.rst | 4 +-
.../source/_templates/autosummary/plot_class.rst | 53 +++++++++++++++++++++
python/docs/source/conf.py | 6 ++-
.../docs/source/reference/pyspark.pandas/frame.rst | 8 +++-
.../source/reference/pyspark.pandas/indexing.rst | 12 +++++
python/docs/source/reference/pyspark.pandas/io.rst | 5 ++
.../source/reference/pyspark.pandas/series.rst | 22 ++++++++-
.../source/reference/pyspark.sql/spark_session.rst | 14 ++++++
14 files changed, 136 insertions(+), 8 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index 5033ab00601a..a4c9ec304258 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -751,7 +751,7 @@ jobs:
# See also https://issues.apache.org/jira/browse/SPARK-35375.
# Pin the MarkupSafe to 2.0.1 to resolve the CI error.
# See also https://issues.apache.org/jira/browse/SPARK-38279.
- python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme
sphinx-copybutton nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
'pyzmq<24.0.0'
+ python3.9 -m pip install 'sphinx==4.2.0' mkdocs
'pydata_sphinx_theme==0.13' sphinx-copybutton nbsphinx numpydoc jinja2
'markupsafe==2.0.1' 'pyzmq<24.0.0'
python3.9 -m pip install ipython_genutils # See SPARK-38517
python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0'
pyarrow pandas 'plotly>=4.8'
python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
diff --git a/dev/requirements.txt b/dev/requirements.txt
index 7de55ec24968..a7af0907c726 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54
mkdocs
# Documentation (Python)
-pydata_sphinx_theme
+pydata_sphinx_theme==0.13
ipython
nbsphinx
numpydoc
-jinja2<3.0.0
-sphinx<3.1.0
+jinja2
+sphinx==4.2.0
sphinx-plotly-directive
sphinx-copybutton
docutils<0.18.0
diff --git a/python/docs/source/_static/spark-logo-dark.png
b/python/docs/source/_static/spark-logo-dark.png
new file mode 100644
index 000000000000..7460faec37fc
Binary files /dev/null and b/python/docs/source/_static/spark-logo-dark.png
differ
diff --git a/python/docs/source/_static/spark-logo-light.png
b/python/docs/source/_static/spark-logo-light.png
new file mode 100644
index 000000000000..41938560822c
Binary files /dev/null and b/python/docs/source/_static/spark-logo-light.png
differ
diff --git a/python/docs/source/_templates/autosummary/accessor_attribute.rst
b/python/docs/source/_templates/autosummary/accessor_attribute.rst
new file mode 100644
index 000000000000..28a94614b98f
--- /dev/null
+++ b/python/docs/source/_templates/autosummary/accessor_attribute.rst
@@ -0,0 +1,6 @@
+{{ fullname }}
+{{ underline }}
+
+.. currentmodule:: {{ module + "." + objname.split(".")[0] }}
+
+.. autoattribute:: {{ ".".join(objname.split(".")[1:]) }}
diff --git a/python/docs/source/_templates/autosummary/accessor_method.rst
b/python/docs/source/_templates/autosummary/accessor_method.rst
new file mode 100644
index 000000000000..dce014d7b5da
--- /dev/null
+++ b/python/docs/source/_templates/autosummary/accessor_method.rst
@@ -0,0 +1,6 @@
+{{ fullname }}
+{{ underline }}
+
+.. currentmodule:: {{ module + "." + objname.split(".")[0] }}
+
+.. automethod:: {{ ".".join(objname.split(".")[1:]) }}
diff --git a/python/docs/source/_templates/autosummary/class_with_docs.rst
b/python/docs/source/_templates/autosummary/class_with_docs.rst
index 7c37b83c0e90..1141fa68a256 100644
--- a/python/docs/source/_templates/autosummary/class_with_docs.rst
+++ b/python/docs/source/_templates/autosummary/class_with_docs.rst
@@ -47,7 +47,9 @@
.. autosummary::
{% for item in attributes %}
- ~{{ name }}.{{ item }}
+ {% if not (item == 'uid') %}
+ ~{{ name }}.{{ item }}
+ {% endif %}
{%- endfor %}
{% endif %}
diff --git a/python/docs/source/_templates/autosummary/plot_class.rst
b/python/docs/source/_templates/autosummary/plot_class.rst
new file mode 100644
index 000000000000..5e6a73bd0ecc
--- /dev/null
+++ b/python/docs/source/_templates/autosummary/plot_class.rst
@@ -0,0 +1,53 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+
+{{ fullname }}
+{{ underline }}
+
+.. currentmodule:: {{ module + "." + objname.split(".")[0] }}
+
+.. automethod:: {{ ".".join(objname.split(".")[1:]) }}
+
+{% if '__init__' in methods %}
+ {% set caught_result = methods.remove('__init__') %}
+{% endif %}
+
+{% block methods %}
+{% if methods %}
+
+ .. rubric:: Methods
+
+ .. autosummary::
+ {% for item in methods %}
+ ~{{ name.split(".")[1] }}.{{ item }}
+ {%- endfor %}
+
+{% endif %}
+{% endblock %}
+
+{% block attributes_summary %}
+{% if attributes %}
+
+ .. rubric:: Attributes
+
+ .. autosummary::
+ {% for item in attributes %}
+ ~{{ name.split(".")[1] }}.{{ item }}
+ {%- endfor %}
+
+{% endif %}
+{% endblock %}
diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index b9884d55b3a1..81083c007b34 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -194,7 +194,11 @@ html_context = {
# further. For a list of options available for each theme, see the
# documentation.
html_theme_options = {
- "navbar_end": ["version-switcher"]
+ "navbar_end": ["version-switcher", "theme-switcher"],
+ "logo": {
+ "image_light": "_static/spark-logo-light.png",
+ "image_dark": "_static/spark-logo-dark.png",
+ }
}
# Add any paths that contain custom themes here, relative to this directory.
diff --git a/python/docs/source/reference/pyspark.pandas/frame.rst
b/python/docs/source/reference/pyspark.pandas/frame.rst
index 911999b56be5..12cf6e7db12f 100644
--- a/python/docs/source/reference/pyspark.pandas/frame.rst
+++ b/python/docs/source/reference/pyspark.pandas/frame.rst
@@ -299,6 +299,7 @@ in Spark. These can be accessed by
``DataFrame.spark.<function/property>``.
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_method.rst
DataFrame.spark.frame
DataFrame.spark.cache
@@ -319,8 +320,8 @@ specific plotting methods of the form
``DataFrame.plot.<kind>``.
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_method.rst
- DataFrame.plot
DataFrame.plot.area
DataFrame.plot.barh
DataFrame.plot.bar
@@ -330,6 +331,10 @@ specific plotting methods of the form
``DataFrame.plot.<kind>``.
DataFrame.plot.pie
DataFrame.plot.scatter
DataFrame.plot.density
+
+.. autosummary::
+ :toctree: api/
+
DataFrame.hist
DataFrame.boxplot
DataFrame.kde
@@ -341,6 +346,7 @@ These can be accessed by
``DataFrame.pandas_on_spark.<function/property>``.
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_method.rst
DataFrame.pandas_on_spark.apply_batch
DataFrame.pandas_on_spark.transform_batch
diff --git a/python/docs/source/reference/pyspark.pandas/indexing.rst
b/python/docs/source/reference/pyspark.pandas/indexing.rst
index 7ec4387bb679..301e849ffe28 100644
--- a/python/docs/source/reference/pyspark.pandas/indexing.rst
+++ b/python/docs/source/reference/pyspark.pandas/indexing.rst
@@ -129,8 +129,14 @@ in Spark. These can be accessed by
``Index.spark.<function/property>``.
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_attribute.rst
Index.spark.column
+
+.. autosummary::
+ :toctree: api/
+ :template: autosummary/accessor_method.rst
+
Index.spark.transform
Sorting
@@ -308,9 +314,15 @@ in Spark. These can be accessed by
``MultiIndex.spark.<function/property>``.
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_attribute.rst
MultiIndex.spark.data_type
MultiIndex.spark.column
+
+.. autosummary::
+ :toctree: api/
+ :template: autosummary/accessor_method.rst
+
MultiIndex.spark.transform
MultiIndex Sorting
diff --git a/python/docs/source/reference/pyspark.pandas/io.rst
b/python/docs/source/reference/pyspark.pandas/io.rst
index 118dd49a4ada..fd41a03699ca 100644
--- a/python/docs/source/reference/pyspark.pandas/io.rst
+++ b/python/docs/source/reference/pyspark.pandas/io.rst
@@ -69,6 +69,11 @@ Generic Spark I/O
:toctree: api/
read_spark_io
+
+.. autosummary::
+ :toctree: api/
+ :template: autosummary/accessor_method.rst
+
DataFrame.spark.to_spark_io
Flat File / CSV
diff --git a/python/docs/source/reference/pyspark.pandas/series.rst
b/python/docs/source/reference/pyspark.pandas/series.rst
index 01fb5aa87fb1..88d1861c6ccf 100644
--- a/python/docs/source/reference/pyspark.pandas/series.rst
+++ b/python/docs/source/reference/pyspark.pandas/series.rst
@@ -270,8 +270,14 @@ in Spark. These can be accessed by
``Series.spark.<function/property>``.
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_attribute.rst
Series.spark.column
+
+.. autosummary::
+ :toctree: api/
+ :template: autosummary/accessor_method.rst
+
Series.spark.transform
Series.spark.apply
@@ -304,6 +310,7 @@ Datetime Properties
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_attribute.rst
Series.dt.date
Series.dt.year
@@ -333,6 +340,7 @@ Datetime Methods
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_method.rst
Series.dt.normalize
Series.dt.strftime
@@ -353,6 +361,7 @@ like ``Series.str.<function/property>``.
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_method.rst
Series.str.capitalize
Series.str.cat
@@ -416,10 +425,16 @@ the ``Series.cat`` accessor.
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_attribute.rst
Series.cat.categories
Series.cat.ordered
Series.cat.codes
+
+.. autosummary::
+ :toctree: api/
+ :template: autosummary/accessor_method.rst
+
Series.cat.rename_categories
Series.cat.reorder_categories
Series.cat.add_categories
@@ -438,8 +453,8 @@ specific plotting methods of the form
``Series.plot.<kind>``.
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_method.rst
- Series.plot
Series.plot.area
Series.plot.bar
Series.plot.barh
@@ -449,6 +464,10 @@ specific plotting methods of the form
``Series.plot.<kind>``.
Series.plot.line
Series.plot.pie
Series.plot.kde
+
+.. autosummary::
+ :toctree: api/
+
Series.hist
Serialization / IO / Conversion
@@ -476,6 +495,7 @@ These can be accessed by
``Series.pandas_on_spark.<function/property>``.
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_method.rst
Series.pandas_on_spark.transform_batch
diff --git a/python/docs/source/reference/pyspark.sql/spark_session.rst
b/python/docs/source/reference/pyspark.sql/spark_session.rst
index f25dbab5f6b9..f242e4439cf4 100644
--- a/python/docs/source/reference/pyspark.sql/spark_session.rst
+++ b/python/docs/source/reference/pyspark.sql/spark_session.rst
@@ -29,12 +29,21 @@ See also :class:`SparkSession`.
:toctree: api/
SparkSession.active
+
+.. autosummary::
+ :toctree: api/
+ :template: autosummary/accessor_method.rst
+
SparkSession.builder.appName
SparkSession.builder.config
SparkSession.builder.enableHiveSupport
SparkSession.builder.getOrCreate
SparkSession.builder.master
SparkSession.builder.remote
+
+.. autosummary::
+ :toctree: api/
+
SparkSession.catalog
SparkSession.conf
SparkSession.createDataFrame
@@ -58,8 +67,13 @@ Spark Connect Only
.. autosummary::
:toctree: api/
+ :template: autosummary/accessor_method.rst
SparkSession.builder.create
+
+.. autosummary::
+ :toctree: api/
+
SparkSession.addArtifact
SparkSession.addArtifacts
SparkSession.copyFromLocalToFs
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]