This is an automated email from the ASF dual-hosted git repository.
HyukjinKwon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 88e67883116a [MINOR][PYTHON][DOC] Fix single-colon ``..
versionadded:`` typos in docstrings
88e67883116a is described below
commit 88e67883116affd342442dafaee27a3e97022666
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu May 28 06:40:14 2026 +0900
[MINOR][PYTHON][DOC] Fix single-colon ``.. versionadded:`` typos in
docstrings
### What changes were proposed in this pull request?
Fix all single-colon `.. versionadded:` typos in PySpark docstrings. The
Sphinx directive is `.. versionadded::` (two colons); the single-colon form
does not render as a directive — Sphinx treats the line as a definition list /
orphan comment, so the "New in version X.Y.Z" notice silently disappears from
the rendered API reference page.
All 32 occurrences are corrected by adding the missing second colon.
Version numbers themselves are unchanged.
| File | Count |
|---|---|
| `python/pyspark/sql/connect/datasource.py` | 1 |
| `python/pyspark/sql/dataframe.py` | 4 |
| `python/pyspark/sql/datasource.py` | 14 |
| `python/pyspark/sql/functions/builtin.py` | 1 |
| `python/pyspark/sql/merge.py` | 1 |
| `python/pyspark/sql/readwriter.py` | 11 |
| **Total** | **32** |
### Why are the changes needed?
The "New in version ..." notice currently fails to render on 32 API
reference pages because of the typo. Users reading the rendered docs cannot
tell when each affected method/class was introduced.
### Does this PR introduce _any_ user-facing change?
Documentation-only change. After the fix, the affected API reference pages
will display the "New in version ..." notice as intended.
### How was this patch tested?
Doc-only change. Verified `grep -rn '\.\. versionadded: [0-9]'
python/pyspark --include='*.py'` returns no matches after the fix, and that no
`versionchanged:` (sibling directive) typos exist.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (model: claude-opus-4-7)
Closes #56142 from zhengruifeng/spark-doc-typo-dev1.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/sql/connect/datasource.py | 2 +-
python/pyspark/sql/dataframe.py | 8 ++++----
python/pyspark/sql/datasource.py | 28 ++++++++++++++--------------
python/pyspark/sql/functions/builtin.py | 2 +-
python/pyspark/sql/merge.py | 2 +-
python/pyspark/sql/readwriter.py | 22 +++++++++++-----------
6 files changed, 32 insertions(+), 32 deletions(-)
diff --git a/python/pyspark/sql/connect/datasource.py
b/python/pyspark/sql/connect/datasource.py
index 15f8edafbf50..c9c3ffcc85fc 100644
--- a/python/pyspark/sql/connect/datasource.py
+++ b/python/pyspark/sql/connect/datasource.py
@@ -27,7 +27,7 @@ class DataSourceRegistration:
"""
Wrapper for data source registration.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
"""
def __init__(self, sparkSession: "SparkSession"):
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 6b17abdc5a9a..f96b614624b7 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -6499,12 +6499,12 @@ class DataFrame:
Use barrier mode execution, ensuring that all Python workers in
the stage will be
launched concurrently.
- .. versionadded: 3.5.0
+ .. versionadded:: 3.5.0
profile : :class:`pyspark.resource.ResourceProfile`. The optional
ResourceProfile
to be used for mapInPandas.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
Examples
@@ -6601,12 +6601,12 @@ class DataFrame:
Use barrier mode execution, ensuring that all Python workers in
the stage will be
launched concurrently.
- .. versionadded: 3.5.0
+ .. versionadded:: 3.5.0
profile : :class:`pyspark.resource.ResourceProfile`. The optional
ResourceProfile
to be used for mapInArrow.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
Examples
--------
diff --git a/python/pyspark/sql/datasource.py b/python/pyspark/sql/datasource.py
index 062dcec5c100..ad28136e8c36 100644
--- a/python/pyspark/sql/datasource.py
+++ b/python/pyspark/sql/datasource.py
@@ -82,7 +82,7 @@ class DataSource(ABC):
After implementing this interface, you can start to load your data source
using
``spark.read.format(...).load()`` and save data using
``df.write.format(...).save()``.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
"""
def __init__(self, options: MutableMapping[str, str]) -> None:
@@ -268,7 +268,7 @@ A tuple of strings representing a column reference.
For example, `("a", "b", "c")` represents the column `a.b.c`.
-.. versionadded: 4.1.0
+.. versionadded:: 4.1.0
"""
@@ -277,7 +277,7 @@ class Filter(ABC):
"""
The base class for filters used for filter pushdown.
- .. versionadded: 4.1.0
+ .. versionadded:: 4.1.0
Notes
-----
@@ -476,7 +476,7 @@ class InputPartition:
A base class representing an input partition returned by the `partitions()`
method of :class:`DataSourceReader`.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
Notes
-----
@@ -514,7 +514,7 @@ class DataSourceReader(ABC):
A base class for data source readers. Data source readers are responsible
for
outputting data from a data source.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
"""
def pushFilters(self, filters: List["Filter"]) -> Iterable["Filter"]:
@@ -534,7 +534,7 @@ class DataSourceReader(ABC):
It's recommended to implement this method only for data sources that
natively
support filtering, such as databases and GraphQL APIs.
- .. versionadded: 4.1.0
+ .. versionadded:: 4.1.0
Parameters
----------
@@ -686,7 +686,7 @@ class DataSourceStreamReader(ABC):
A base class for streaming data source readers. Data source stream readers
are responsible
for outputting data from a streaming data source.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
"""
def initialOffset(self) -> dict:
@@ -894,7 +894,7 @@ class SimpleDataSourceStreamReader(ABC):
Use :class:`DataSourceStreamReader` when read throughput is high and can't
be handled
by a single process.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
"""
def initialOffset(self) -> dict:
@@ -982,7 +982,7 @@ class DataSourceWriter(ABC):
A base class for data source writers. Data source writers are responsible
for saving
the data to the data source.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
"""
@abstractmethod
@@ -1052,7 +1052,7 @@ class DataSourceArrowWriter(DataSourceWriter):
is optimized for using the Arrow format when writing data. It can offer
better performance
when interfacing with systems or libraries that natively support Arrow.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
"""
@abstractmethod
@@ -1100,7 +1100,7 @@ class DataSourceStreamWriter(ABC):
A base class for data stream writers. Data stream writers are responsible
for writing
the data to the streaming sink.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
"""
@abstractmethod
@@ -1175,7 +1175,7 @@ class DataSourceStreamArrowWriter(DataSourceStreamWriter):
performance when interfacing with systems or libraries that natively
support Arrow for
streaming use cases.
- .. versionadded: 4.1.0
+ .. versionadded:: 4.1.0
"""
@abstractmethod
@@ -1225,7 +1225,7 @@ class WriterCommitMessage:
sent back to the driver side as input parameter of
:meth:`DataSourceWriter.commit`
or :meth:`DataSourceWriter.abort` method.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
Notes
-----
@@ -1240,7 +1240,7 @@ class DataSourceRegistration:
Wrapper for data source registration. This instance can be accessed by
:attr:`spark.dataSource`.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
"""
def __init__(self, sparkSession: "SparkSession"):
diff --git a/python/pyspark/sql/functions/builtin.py
b/python/pyspark/sql/functions/builtin.py
index e28989756414..2e56ad92f1a0 100644
--- a/python/pyspark/sql/functions/builtin.py
+++ b/python/pyspark/sql/functions/builtin.py
@@ -15542,7 +15542,7 @@ def levenshtein(
if set when the levenshtein distance of the two given strings
less than or equal to a given threshold then return result distance,
or -1
- .. versionadded: 3.5.0
+ .. versionadded:: 3.5.0
Returns
-------
diff --git a/python/pyspark/sql/merge.py b/python/pyspark/sql/merge.py
index 7eaee915f432..dbbc8692ba50 100644
--- a/python/pyspark/sql/merge.py
+++ b/python/pyspark/sql/merge.py
@@ -31,7 +31,7 @@ class MergeIntoWriter:
`MergeIntoWriter` provides methods to define and execute merge actions
based
on specified conditions.
- .. versionadded: 4.0.0
+ .. versionadded:: 4.0.0
"""
def __init__(self, df: "DataFrame", table: str, condition: Column):
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index e289faf89997..afe8000b5c45 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -2464,7 +2464,7 @@ class DataFrameWriterV2:
Specifies a provider for the underlying output data source.
Spark's default catalog supports "parquet", "json", etc.
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
self._jwriter.using(provider)
return self
@@ -2473,7 +2473,7 @@ class DataFrameWriterV2:
"""
Add a write option.
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
if value is None:
return self
@@ -2484,7 +2484,7 @@ class DataFrameWriterV2:
"""
Add write options.
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
options = {k: to_str(v) for k, v in options.items() if v is not None}
self._jwriter.options(options)
@@ -2494,7 +2494,7 @@ class DataFrameWriterV2:
"""
Add table property.
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
self._jwriter.tableProperty(property, value)
return self
@@ -2525,7 +2525,7 @@ class DataFrameWriterV2:
* :py:func:`pyspark.sql.functions.hours`
* :py:func:`pyspark.sql.functions.bucket`
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
from pyspark.sql.classic.column import _to_seq, _to_java_column
@@ -2550,7 +2550,7 @@ class DataFrameWriterV2:
The new table's schema, partition layout, properties, and other
configuration will be
based on the configuration set on this writer.
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
self._jwriter.create()
@@ -2561,7 +2561,7 @@ class DataFrameWriterV2:
The existing table's schema, partition layout, properties, and other
configuration will be
replaced with the contents of the data frame and the configuration set
on this writer.
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
self._jwriter.replace()
@@ -2574,7 +2574,7 @@ class DataFrameWriterV2:
and the configuration set on this writer.
If the table exists, its configuration and data will be replaced.
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
self._jwriter.createOrReplace()
@@ -2582,7 +2582,7 @@ class DataFrameWriterV2:
"""
Append the contents of the data frame to the output table.
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
self._jwriter.append()
@@ -2591,7 +2591,7 @@ class DataFrameWriterV2:
Overwrite rows matching the given filter condition with the contents
of the data frame in
the output table.
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
from pyspark.sql.classic.column import _to_java_column
@@ -2606,7 +2606,7 @@ class DataFrameWriterV2:
This operation is equivalent to Hive's `INSERT OVERWRITE ...
PARTITION`, which replaces
partitions dynamically depending on the contents of the data frame.
- .. versionadded: 3.1.0
+ .. versionadded:: 3.1.0
"""
self._jwriter.overwritePartitions()
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]