Re: [PR] GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide [arrow]

via GitHub Tue, 13 Jan 2026 12:43:48 -0800


tadeja commented on code in PR #48619:
URL: https://github.com/apache/arrow/pull/48619#discussion_r2687339467



##########
docs/source/python/data.rst:
##########
@@ -553,20 +915,27 @@ Many functions in PyArrow either return or take as an 
argument a :class:`RecordB
 It can be used like any iterable of record batches, but also provides their 
common
 schema without having to get any of the batches.::

Review Comment:
   ```suggestion
   schema without having to get any of the batches.
   ```
   Is this `::` to be removed? As you added `..code-block` below.



##########
docs/source/python/conftest.py:
##########
@@ -0,0 +1,35 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pytest
+
+
+# Save output files from doctest examples into temp dir
[email protected](autouse=True)
+def _docdir(request):
+    # Trigger ONLY for the doctests
+    is_doctest = request.node.__class__.__name__ == 'DoctestItem'

Review Comment:
       from _pytest.doctest import DoctestItem
       is_doctest = isinstance(request.node, DoctestItem)
   maybe this instead of comparing string?



##########
docs/source/python/parquet.rst:
##########
@@ -463,26 +557,38 @@ the same:
 
 .. code-block:: python
 
-   metadata_collector = []
-   pq.write_table(
-       table1, root_path / "year=2017/data1.parquet",
-       metadata_collector=metadata_collector
-   )
-
-   # set the file path relative to the root of the partitioned dataset
-   metadata_collector[-1].set_file_path("year=2017/data1.parquet")
-
-   # combine and write the metadata
-   metadata = metadata_collector[0]
-   for _meta in metadata_collector[1:]:
-       metadata.append_row_groups(_meta)
-   metadata.write_metadata_file(root_path / "_metadata")
-
-   # or use pq.write_metadata to combine and write in a single step
-   pq.write_metadata(
-       table1.schema, root_path / "_metadata",
-       metadata_collector=metadata_collector
-   )
+   >>> import os
+   >>> os.mkdir("year=2017")
+
+   >>> metadata_collector = []
+   >>> pq.write_table(
+   ...     table, "year=2017/data1.parquet",
+   ...     metadata_collector=metadata_collector
+   ... )
+
+   >>> # set the file path relative to the root of the partitioned dataset
+   >>> metadata_collector[-1].set_file_path("year=2017/data1.parquet")
+
+   >>> # combine and write the metadata
+   >>> metadata = metadata_collector[0]
+   >>> for _meta in metadata_collector[1:]:
+   ...     metadata.append_row_groups(_meta)
+   >>> metadata.write_metadata_file("_metadata")
+
+   >>> # or use pq.write_metadata to combine and write in a single step
+   >>> pq.write_metadata(
+   ...     table.schema, "_metadata",
+   ...     metadata_collector=metadata_collector
+   ... )
+
+    >>> pq.read_metadata("_metadata")
+    <pyarrow._parquet.FileMetaData object at ...>
+      created_by: parquet-cpp-arrow version ...
+      num_columns: 3
+      num_rows: 3
+      num_row_groups: 1
+      format_version: 2.6
+      serialized_size: ...

Review Comment:
   ```suggestion
      >>> pq.read_metadata("_metadata")
      <pyarrow._parquet.FileMetaData object at ...>
        created_by: parquet-cpp-arrow version ...
        num_columns: 3
        num_rows: 3
        num_row_groups: 1
        format_version: 2.6
        serialized_size: ...
   ```
   
   Removed 4th space character to align `>>> pq.read_metadata("_metadata")` 
with the lines above. Then I guess similar indent for the following lines.



##########
docs/source/python/csv.rst:
##########
@@ -83,15 +88,21 @@ Customized parsing
 
 To alter the default parsing settings in case of reading CSV files with an
 unusual structure, you should create a :class:`ParseOptions` instance
-and pass it to :func:`read_csv`::
-
-    import pyarrow as pa
-    import pyarrow.csv as csv
-
-    table = csv.read_csv('tips.csv.gz', parse_options=csv.ParseOptions(
-       delimiter=";",
-       invalid_row_handler=skip_handler
-    ))
+and pass it to :func:`read_csv`:
+
+.. code-block:: python
+
+    >>> def skip_handler(row):
+    ...     pass
+    >>> table = csv.read_csv('tips.csv.gz', parse_options=csv.ParseOptions(
+    ...    delimiter=";",
+    ...    invalid_row_handler=skip_handler
+    ... ))
+    >>> table
+    pyarrow.Table
+    col1,"col2": string
+    ----
+    col1,"col2": [["1,"a"","2,"b"","3,"c""]]

Review Comment:
   This delimiter `;` with table output example here is intentionally funkier 
than the other two table outputs on `csv.rst`, right? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide [arrow]

Reply via email to