(arrow) branch main updated: GH-41611: [Docs][CI] Enable most sphinx-lint rules for documentation (#41612)

alenka Thu, 16 May 2024 04:30:32 -0700

This is an automated email from the ASF dual-hosted git repository.

alenka pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git



The following commit(s) were added to refs/heads/main by this push:
     new 07a30d9a57 GH-41611: [Docs][CI] Enable most sphinx-lint rules for 
documentation (#41612)
07a30d9a57 is described below

commit 07a30d9a5784852187d100660325b8c12b4ff6c8
Author: Bryce Mecum <[email protected]>
AuthorDate: Thu May 16 03:30:14 2024 -0800

    GH-41611: [Docs][CI] Enable most sphinx-lint rules for documentation 
(#41612)
    
    ### Rationale for this change
    
    https://github.com/apache/arrow/issues/41611
    
    ### What changes are included in this PR?
    
    - Update to pre-commit config to enable all checks except 
`dangling-hyphen`, `line-too-long` by default
    - Associated fix docs
    
    ### Are these changes tested?
    
    Yes, by building and looking at the docs locally.
    
    ### Are there any user-facing changes?
    
    Just docs.
    * GitHub Issue: #41611
    
    Authored-by: Bryce Mecum <[email protected]>
    Signed-off-by: AlenkaF <[email protected]>
---
 .pre-commit-config.yaml                            | 10 +++++++--
 docs/source/conf.py                                |  2 +-
 docs/source/cpp/acero/developer_guide.rst          | 10 ++++-----
 docs/source/cpp/acero/overview.rst                 | 26 +++++++++++-----------
 docs/source/cpp/acero/user_guide.rst               |  8 +++----
 docs/source/cpp/build_system.rst                   |  2 +-
 docs/source/cpp/compute.rst                        | 18 +++++++--------
 docs/source/developers/cpp/building.rst            |  2 +-
 docs/source/developers/documentation.rst           |  2 +-
 .../guide/step_by_step/arrow_codebase.rst          |  4 ++--
 .../developers/guide/step_by_step/set_up.rst       |  8 +++----
 docs/source/developers/java/development.rst        |  2 +-
 docs/source/developers/release.rst                 |  4 ++--
 docs/source/format/CanonicalExtensions.rst         |  4 ++--
 docs/source/format/Columnar.rst                    |  6 ++---
 docs/source/format/FlightSql.rst                   |  2 +-
 docs/source/format/Integration.rst                 |  2 +-
 docs/source/java/algorithm.rst                     |  2 +-
 docs/source/java/flight_sql_jdbc_driver.rst        |  2 +-
 docs/source/java/install.rst                       |  2 +-
 docs/source/java/ipc.rst                           |  2 +-
 docs/source/java/quickstartguide.rst               | 16 ++++++-------
 docs/source/java/substrait.rst                     | 20 ++++++++---------
 docs/source/java/table.rst                         | 16 ++++++-------
 docs/source/python/api/compute.rst                 |  2 +-
 docs/source/python/data.rst                        |  4 ++--
 docs/source/python/extending_types.rst             |  2 +-
 docs/source/python/filesystems.rst                 |  4 ++--
 docs/source/python/install.rst                     |  2 +-
 docs/source/python/integration/extending.rst       |  2 +-
 docs/source/python/memory.rst                      |  2 +-
 docs/source/python/timestamps.rst                  |  2 +-
 32 files changed, 99 insertions(+), 93 deletions(-)

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index bf5ca08d53..7dcc1c9816 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -136,5 +136,11 @@ repos:
     rev: v0.9.1
     hooks:
       - id: sphinx-lint
-        files: ^docs/
-        args: ['--disable', 'all', '--enable', 
'trailing-whitespace,missing-final-newline', 'docs']
+        files: ^docs/source
+        exclude: ^docs/source/python/generated
+        args: [
+          '--enable',
+          'all',
+          '--disable',
+          'dangling-hyphen,line-too-long',
+        ]
diff --git a/docs/source/conf.py b/docs/source/conf.py
index b487200555..1e6c113e33 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -535,7 +535,7 @@ latex_documents = [
 #
 # latex_appendices = []
 
-# It false, will not define \strong, \code,    itleref, \crossref ... but only
+# It false, will not define \strong, \code, \titleref, \crossref ... but only
 # \sphinxstrong, ..., \sphinxtitleref, ... To help avoid clash with user added
 # packages.
 #
diff --git a/docs/source/cpp/acero/developer_guide.rst 
b/docs/source/cpp/acero/developer_guide.rst
index 80ca68556f..7dd08fe3ce 100644
--- a/docs/source/cpp/acero/developer_guide.rst
+++ b/docs/source/cpp/acero/developer_guide.rst
@@ -327,8 +327,8 @@ An engine could choose to create a thread task for every 
execution of a node.  H
 this leads to problems with cache locality.  For example, let's assume we have 
a basic plan consisting of three
 exec nodes, scan, project, and then filter (this is a very common use case).  
Now let's assume there are 100 batches.
 In a task-per-operator model we would have tasks like "Scan Batch 5", "Project 
Batch 5", and "Filter Batch 5".  Each
-of those tasks is potentially going to access the same data.  For example, 
maybe the `project` and `filter` nodes need
-to read the same column.  A column which is intially created in a decode phase 
of the `scan` node.  To maximize cache
+of those tasks is potentially going to access the same data.  For example, 
maybe the ``project`` and ``filter`` nodes need
+to read the same column.  A column which is intially created in a decode phase 
of the ``scan`` node.  To maximize cache
 utilization we would need to carefully schedule our tasks to ensure that all 
three of those tasks are run consecutively
 and assigned to the same CPU core.
 
@@ -412,7 +412,7 @@ Ordered Execution
 =================
 
 Some nodes either establish an ordering to their outgoing batches or they need 
to be able to process batches in order.
-Acero handles ordering using the `batch_index` property on an ExecBatch.  If a 
node has a deterministic output order
+Acero handles ordering using the ``batch_index`` property on an ExecBatch.  If 
a node has a deterministic output order
 then it should apply a batch index on batches that it emits.  For example, the 
OrderByNode applies a new ordering to
 batches (regardless of the incoming ordering).  The scan node is able to 
attach an implicit ordering to batches which
 reflects the order of the rows in the files being scanned.
@@ -461,8 +461,8 @@ Acero's tracing is currently half-implemented and there are 
major gaps in profil
 effort at tracing with open telemetry and most of the necessary pieces are in 
place.  The main thing currently lacking is
 some kind of effective visualization of the tracing results.
 
-In order to use the tracing that is present today you will need to build with 
Arrow with `ARROW_WITH_OPENTELEMETRY=ON`.
-Then you will need to set the environment variable 
`ARROW_TRACING_BACKEND=otlp_http`.  This will configure open telemetry
+In order to use the tracing that is present today you will need to build with 
Arrow with ``ARROW_WITH_OPENTELEMETRY=ON``.
+Then you will need to set the environment variable 
``ARROW_TRACING_BACKEND=otlp_http``.  This will configure open telemetry
 to export trace results (as OTLP) to the HTTP endpoint 
http://localhost:4318/v1/traces.  You will need to configure an
 open telemetry collector to collect results on that endpoint and you will need 
to configure a trace viewer of some kind
 such as Jaeger: https://www.jaegertracing.io/docs/1.21/opentelemetry/
diff --git a/docs/source/cpp/acero/overview.rst 
b/docs/source/cpp/acero/overview.rst
index 8be4cbc1b1..34e0b143bc 100644
--- a/docs/source/cpp/acero/overview.rst
+++ b/docs/source/cpp/acero/overview.rst
@@ -209,16 +209,16 @@ must have the same length.  There are a few key 
differences from ExecBatch:
 
    Both the record batch and the exec batch have strong ownership of the 
arrays & buffers
 
-* An `ExecBatch` does not have a schema.  This is because an `ExecBatch` is 
assumed to be
+* An ``ExecBatch`` does not have a schema.  This is because an ``ExecBatch`` 
is assumed to be
   part of a stream of batches and the stream is assumed to have a consistent 
schema.  So
-  the schema for an `ExecBatch` is typically stored in the ExecNode.
-* Columns in an `ExecBatch` are either an `Array` or a `Scalar`.  When a 
column is a `Scalar`
-  this means that the column has a single value for every row in the batch.  
An `ExecBatch`
+  the schema for an ``ExecBatch`` is typically stored in the ExecNode.
+* Columns in an ``ExecBatch`` are either an ``Array`` or a ``Scalar``.  When a 
column is a ``Scalar``
+  this means that the column has a single value for every row in the batch.  
An ``ExecBatch``
   also has a length property which describes how many rows are in a batch.  So 
another way to
-  view a `Scalar` is a constant array with `length` elements.
-* An `ExecBatch` contains additional information used by the exec plan.  For 
example, an
-  `index` can be used to describe a batch's position in an ordered stream.  We 
expect
-  that `ExecBatch` will also evolve to contain additional fields such as a 
selection vector.
+  view a ``Scalar`` is a constant array with ``length`` elements.
+* An ``ExecBatch`` contains additional information used by the exec plan.  For 
example, an
+  ``index`` can be used to describe a batch's position in an ordered stream.  
We expect
+  that ``ExecBatch`` will also evolve to contain additional fields such as a 
selection vector.
 
 .. figure:: scalar_vs_array.svg
 
@@ -231,8 +231,8 @@ only zero copy if there are no scalars in the exec batch.
 
 .. note::
    Both Acero and the compute module have "lightweight" versions of batches 
and arrays.
-   In the compute module these are called `BatchSpan`, `ArraySpan`, and 
`BufferSpan`.  In
-   Acero the concept is called `KeyColumnArray`.  These types were developed 
concurrently
+   In the compute module these are called ``BatchSpan``, ``ArraySpan``, and 
``BufferSpan``.  In
+   Acero the concept is called ``KeyColumnArray``.  These types were developed 
concurrently
    and serve the same purpose.  They aim to provide an array container that 
can be completely
    stack allocated (provided the data type is non-nested) in order to avoid 
heap allocation
    overhead.  Ideally these two concepts will be merged someday.
@@ -247,9 +247,9 @@ execution of the nodes.  Both ExecPlan and ExecNode are 
tied to the lifecycle of
 They have state and are not expected to be restartable.
 
 .. warning::
-   The structures within Acero, including `ExecBatch`, are still experimental. 
 The `ExecBatch`
-   class should not be used outside of Acero.  Instead, an `ExecBatch` should 
be converted to
-   a more standard structure such as a `RecordBatch`.
+   The structures within Acero, including ``ExecBatch``, are still 
experimental.  The ``ExecBatch``
+   class should not be used outside of Acero.  Instead, an ``ExecBatch`` 
should be converted to
+   a more standard structure such as a ``RecordBatch``.
 
    Similarly, an ExecPlan is an internal concept.  Users creating plans should 
be using Declaration
    objects.  APIs for consuming and executing plans should abstract away the 
details of the underlying
diff --git a/docs/source/cpp/acero/user_guide.rst 
b/docs/source/cpp/acero/user_guide.rst
index adcc17216e..0271be2180 100644
--- a/docs/source/cpp/acero/user_guide.rst
+++ b/docs/source/cpp/acero/user_guide.rst
@@ -455,8 +455,8 @@ can be selected from :ref:`this list of aggregation 
functions
           will be added which should alleviate this constraint.
 
 The aggregation can provide results as a group or scalar. For instances,
-an operation like `hash_count` provides the counts per each unique record
-as a grouped result while an operation like `sum` provides a single record.
+an operation like ``hash_count`` provides the counts per each unique record
+as a grouped result while an operation like ``sum`` provides a single record.
 
 Scalar Aggregation example:
 
@@ -490,7 +490,7 @@ caller will repeatedly call this function until the 
generator function is exhaus
 will accumulate in memory.  An execution plan should only have one
 "terminal" node (one sink node).  An :class:`ExecPlan` can terminate early due 
to cancellation or
 an error, before the output is fully consumed. However, the plan can be safely 
destroyed independently
-of the sink, which will hold the unconsumed batches by `exec_plan->finished()`.
+of the sink, which will hold the unconsumed batches by 
``exec_plan->finished()``.
 
 As a part of the Source Example, the Sink operation is also included;
 
@@ -515,7 +515,7 @@ The consuming function may be called before a previous 
invocation has completed.
 function does not run quickly enough then many concurrent executions could 
pile up, blocking the
 CPU thread pool.  The execution plan will not be marked finished until all 
consuming function callbacks
 have been completed.
-Once all batches have been delivered the execution plan will wait for the 
`finish` future to complete
+Once all batches have been delivered the execution plan will wait for the 
``finish`` future to complete
 before marking the execution plan finished.  This allows for workflows where 
the consumption function
 converts batches into async tasks (this is currently done internally for the 
dataset write node).
 
diff --git a/docs/source/cpp/build_system.rst b/docs/source/cpp/build_system.rst
index 0c94d7e5ce..e80bca4c94 100644
--- a/docs/source/cpp/build_system.rst
+++ b/docs/source/cpp/build_system.rst
@@ -167,7 +167,7 @@ file into an executable linked with the Arrow C++ shared 
library:
 .. code-block:: makefile
 
    my_example: my_example.cc
-       $(CXX) -o $@ $(CXXFLAGS) $< $$(pkg-config --cflags --libs arrow)
+       $(CXX) -o $@ $(CXXFLAGS) $< $$(pkg-config --cflags --libs arrow)
 
 Many build systems support pkg-config. For example:
 
diff --git a/docs/source/cpp/compute.rst b/docs/source/cpp/compute.rst
index 546b6e5716..701c7d573a 100644
--- a/docs/source/cpp/compute.rst
+++ b/docs/source/cpp/compute.rst
@@ -514,8 +514,8 @@ Mixed time resolution temporal inputs will be cast to 
finest input resolution.
   +------------+---------------------------------------------+
 
   It's compatible with Redshift's decimal promotion rules. All decimal digits
-  are preserved for `add`, `subtract` and `multiply` operations. The result
-  precision of `divide` is at least the sum of precisions of both operands with
+  are preserved for ``add``, ``subtract`` and ``multiply`` operations. The 
result
+  precision of ``divide`` is at least the sum of precisions of both operands 
with
   enough scale kept. Error is returned if the result precision is beyond the
   decimal value range.
 
@@ -1029,7 +1029,7 @@ These functions trim off characters on both sides (trim), 
or the left (ltrim) or
 
+--------------------------+------------+-------------------------+---------------------+----------------------------------------+---------+
 
 * \(1) Only characters specified in :member:`TrimOptions::characters` will be
-  trimmed off. Both the input string and the `characters` argument are
+  trimmed off. Both the input string and the ``characters`` argument are
   interpreted as ASCII characters.
 
 * \(2) Only trim off ASCII whitespace characters (``'\t'``, ``'\n'``, ``'\v'``,
@@ -1570,7 +1570,7 @@ is the same, even though the UTC years would be different.
 Timezone handling
 ~~~~~~~~~~~~~~~~~
 
-`assume_timezone` function is meant to be used when an external system produces
+``assume_timezone`` function is meant to be used when an external system 
produces
 "timezone-naive" timestamps which need to be converted to "timezone-aware"
 timestamps (see for example the `definition
 <https://docs.python.org/3/library/datetime.html#aware-and-naive-objects>`__
@@ -1581,11 +1581,11 @@ Input timestamps are assumed to be relative to the 
timezone given in
 UTC-relative timestamps with the timezone metadata set to the above value.
 An error is returned if the timestamps already have the timezone metadata set.
 
-`local_timestamp` function converts UTC-relative timestamps to local 
"timezone-naive"
+``local_timestamp`` function converts UTC-relative timestamps to local 
"timezone-naive"
 timestamps. The timezone is taken from the timezone metadata of the input
-timestamps. This function is the inverse of `assume_timezone`. Please note:
+timestamps. This function is the inverse of ``assume_timezone``. Please note:
 **all temporal functions already operate on timestamps as if they were in local
-time of the metadata provided timezone**. Using `local_timestamp` is only 
meant to be
+time of the metadata provided timezone**. Using ``local_timestamp`` is only 
meant to be
 used when an external system expects local timestamps.
 
 
+-----------------+-------+-------------+---------------+---------------------------------+-------+
@@ -1649,8 +1649,8 @@ overflow is detected.
 
 * \(1) CumulativeOptions has two optional parameters. The first parameter
   :member:`CumulativeOptions::start` is a starting value for the running
-  accumulation. It has a default value of 0 for `sum`, 1 for `prod`, min of
-  input type for `max`, and max of input type for `min`. Specified values of
+  accumulation. It has a default value of 0 for ``sum``, 1 for ``prod``, min of
+  input type for ``max``, and max of input type for ``min``. Specified values 
of
   ``start`` must be castable to the input type. The second parameter
   :member:`CumulativeOptions::skip_nulls` is a boolean. When set to
   false (the default), the first encountered null is propagated. When set to
diff --git a/docs/source/developers/cpp/building.rst 
b/docs/source/developers/cpp/building.rst
index 7b80d2138c..b052b856c9 100644
--- a/docs/source/developers/cpp/building.rst
+++ b/docs/source/developers/cpp/building.rst
@@ -312,7 +312,7 @@ depends on ``python`` being available).
 
 On some Linux distributions, running the test suite might require setting an
 explicit locale. If you see any locale-related errors, try setting the
-environment variable (which requires the `locales` package or equivalent):
+environment variable (which requires the ``locales`` package or equivalent):
 
 .. code-block::
 
diff --git a/docs/source/developers/documentation.rst 
b/docs/source/developers/documentation.rst
index 8b1ea28c0f..a479065f62 100644
--- a/docs/source/developers/documentation.rst
+++ b/docs/source/developers/documentation.rst
@@ -259,7 +259,7 @@ Build the docs in the target directory:
    sphinx-build ./source/developers ./source/developers/_build -c ./source -D 
master_doc=temp_index
 
 This builds everything in the target directory to a folder inside of it
-called ``_build`` using the config file in the `source` directory.
+called ``_build`` using the config file in the ``source`` directory.
 
 Once you have verified the HTML documents, you can remove temporary index file:
 
diff --git a/docs/source/developers/guide/step_by_step/arrow_codebase.rst 
b/docs/source/developers/guide/step_by_step/arrow_codebase.rst
index 0beece991b..0c194ab3a3 100644
--- a/docs/source/developers/guide/step_by_step/arrow_codebase.rst
+++ b/docs/source/developers/guide/step_by_step/arrow_codebase.rst
@@ -99,8 +99,8 @@ can be called from a function in another language.  After a 
function is defined
 C++ we must create the binding manually to use it in that implementation.
 
 .. note::
-       There is much you can learn by checking **Pull Requests**
-       and **unit tests** for similar issues.
+  There is much you can learn by checking **Pull Requests**
+  and **unit tests** for similar issues.
 
 .. tab-set::
 
diff --git a/docs/source/developers/guide/step_by_step/set_up.rst 
b/docs/source/developers/guide/step_by_step/set_up.rst
index 9a2177568d..9c808ceee7 100644
--- a/docs/source/developers/guide/step_by_step/set_up.rst
+++ b/docs/source/developers/guide/step_by_step/set_up.rst
@@ -118,10 +118,10 @@ Should give you a result similar to this:
 
 .. code:: console
 
-   origin      https://github.com/<your username>/arrow.git (fetch)
-   origin      https://github.com/<your username>/arrow.git (push)
-   upstream    https://github.com/apache/arrow (fetch)
-   upstream    https://github.com/apache/arrow (push)
+   origin    https://github.com/<your username>/arrow.git (fetch)
+   origin    https://github.com/<your username>/arrow.git (push)
+   upstream  https://github.com/apache/arrow (fetch)
+   upstream  https://github.com/apache/arrow (push)
 
 If you did everything correctly, you should now have a copy of the code
 in the ``arrow`` directory and two remotes that refer to your own GitHub
diff --git a/docs/source/developers/java/development.rst 
b/docs/source/developers/java/development.rst
index 17d47c324c..3f0ff6cdd0 100644
--- a/docs/source/developers/java/development.rst
+++ b/docs/source/developers/java/development.rst
@@ -118,7 +118,7 @@ This checks the code style of all source code under the 
current directory or fro
 
     $ mvn checkstyle:check
 
-Maven `pom.xml` style is enforced with Spotless using `Apache Maven pom.xml 
guidelines`_
+Maven ``pom.xml`` style is enforced with Spotless using `Apache Maven pom.xml 
guidelines`_
 You can also just check the style without building the project.
 This checks the style of all pom.xml files under the current directory or from 
within an individual module.
 
diff --git a/docs/source/developers/release.rst 
b/docs/source/developers/release.rst
index 0b3a83dc5a..d903cc71bd 100644
--- a/docs/source/developers/release.rst
+++ b/docs/source/developers/release.rst
@@ -106,7 +106,7 @@ If there is consensus and there is a Release Manager 
willing to take the effort
 the release a patch release can be created.
 
 Committers can tag issues that should be included on the next patch release 
using the
-`backport-candidate` label. Is the responsability of the author or the 
committer to add the
+``backport-candidate`` label. Is the responsability of the author or the 
committer to add the
 label to the issue to help the Release Manager identify the issues that should 
be backported.
 
 If a specific issue is identified as the reason to create a patch release the 
Release Manager
@@ -117,7 +117,7 @@ Be sure to go through on the following checklist:
 #. Create milestone
 #. Create maintenance branch
 #. Include issue that was requested as requiring new patch release
-#. Add new milestone to issues with `backport-candidate` label
+#. Add new milestone to issues with ``backport-candidate`` label
 #. cherry-pick issues into maintenance branch
 
 Creating a Release Candidate
diff --git a/docs/source/format/CanonicalExtensions.rst 
b/docs/source/format/CanonicalExtensions.rst
index c60f095dd3..c258f889dc 100644
--- a/docs/source/format/CanonicalExtensions.rst
+++ b/docs/source/format/CanonicalExtensions.rst
@@ -77,7 +77,7 @@ Official List
 Fixed shape tensor
 ==================
 
-* Extension name: `arrow.fixed_shape_tensor`.
+* Extension name: ``arrow.fixed_shape_tensor``.
 
 * The storage type of the extension: ``FixedSizeList`` where:
 
@@ -153,7 +153,7 @@ Fixed shape tensor
 Variable shape tensor
 =====================
 
-* Extension name: `arrow.variable_shape_tensor`.
+* Extension name: ``arrow.variable_shape_tensor``.
 
 * The storage type of the extension is: ``StructArray`` where struct
   is composed of **data** and **shape** fields describing a single
diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst
index ec6a7fa5e3..7c853de782 100644
--- a/docs/source/format/Columnar.rst
+++ b/docs/source/format/Columnar.rst
@@ -312,7 +312,7 @@ Each value in this layout consists of 0 or more bytes. 
While primitive
 arrays have a single values buffer, variable-size binary have an
 **offsets** buffer and **data** buffer.
 
-The offsets buffer contains `length + 1` signed integers (either
+The offsets buffer contains ``length + 1`` signed integers (either
 32-bit or 64-bit, depending on the logical type), which encode the
 start position of each slot in the data buffer. The length of the
 value in each slot is computed using the difference between the offset
@@ -374,7 +374,7 @@ locations are indicated using a **views** buffer, which may 
point to one
 of potentially several **data** buffers or may contain the characters
 inline.
 
-The views buffer contains `length` view structures with the following layout:
+The views buffer contains ``length`` view structures with the following layout:
 
 ::
 
@@ -394,7 +394,7 @@ should be interpreted.
 
 In the short string case the string's bytes are inlined — stored inside the
 view itself, in the twelve bytes which follow the length. Any remaining bytes
-after the string itself are padded with `0`.
+after the string itself are padded with ``0``.
 
 In the long string case, a buffer index indicates which data buffer
 stores the data bytes and an offset indicates where in that buffer the
diff --git a/docs/source/format/FlightSql.rst b/docs/source/format/FlightSql.rst
index 9c3523755f..b4b85e77a2 100644
--- a/docs/source/format/FlightSql.rst
+++ b/docs/source/format/FlightSql.rst
@@ -193,7 +193,7 @@ in the ``app_metadata`` field of the Flight RPC 
``PutResult`` returned.
 
     When used with DoPut: load the stream of Arrow record batches into
     the specified target table and return the number of rows ingested
-    via a `DoPutUpdateResult` message.
+    via a ``DoPutUpdateResult`` message.
 
 Flight Server Session Management
 --------------------------------
diff --git a/docs/source/format/Integration.rst 
b/docs/source/format/Integration.rst
index c800255687..436747989a 100644
--- a/docs/source/format/Integration.rst
+++ b/docs/source/format/Integration.rst
@@ -501,7 +501,7 @@ integration testing actually tests.
 
 There are two types of integration test cases: the ones populated on the fly
 by the data generator in the Archery utility, and *gold* files that exist
-in the `arrow-testing 
<https://github.com/apache/arrow-testing/tree/master/data/arrow-ipc-stream/integration>`
+in the `arrow-testing 
<https://github.com/apache/arrow-testing/tree/master/data/arrow-ipc-stream/integration>`_
 repository.
 
 Data Generator Tests
diff --git a/docs/source/java/algorithm.rst b/docs/source/java/algorithm.rst
index 06ed32bd48..d4838967d6 100644
--- a/docs/source/java/algorithm.rst
+++ b/docs/source/java/algorithm.rst
@@ -82,7 +82,7 @@ for fixed width and variable width vectors, respectively. 
Both algorithms run in
 
 3. **Index sorter**: this sorter does not actually sort the vector. Instead, 
it returns an integer
 vector, which correspond to indices of vector elements in sorted order. With 
the index vector, one can
-easily construct a sorted vector. In addition, some other tasks can be easily 
achieved, like finding the ``k``th
+easily construct a sorted vector. In addition, some other tasks can be easily 
achieved, like finding the ``k`` th
 smallest value in the vector. Index sorting is supported by 
``org.apache.arrow.algorithm.sort.IndexSorter``,
 which runs in ``O(nlog(n))`` time. It is applicable to vectors of any type.
 
diff --git a/docs/source/java/flight_sql_jdbc_driver.rst 
b/docs/source/java/flight_sql_jdbc_driver.rst
index cc8822247b..f95c2ac755 100644
--- a/docs/source/java/flight_sql_jdbc_driver.rst
+++ b/docs/source/java/flight_sql_jdbc_driver.rst
@@ -162,7 +162,7 @@ the Flight SQL service as gRPC headers. For example, the 
following URI ::
 
 This will connect without authentication or encryption, to a Flight
 SQL service running on ``localhost`` on port 12345. Each request will
-also include a `database=mydb` gRPC header.
+also include a ``database=mydb`` gRPC header.
 
 Connection parameters may also be supplied using the Properties object
 when using the JDBC Driver Manager to connect. When supplying using
diff --git a/docs/source/java/install.rst b/docs/source/java/install.rst
index a551edc36c..dc6a55c87f 100644
--- a/docs/source/java/install.rst
+++ b/docs/source/java/install.rst
@@ -63,7 +63,7 @@ Modifying the command above for Flight:
 Otherwise, you may see errors like ``java.lang.IllegalAccessError: superclass 
access check failed: class
 org.apache.arrow.flight.ArrowMessage$ArrowBufRetainingCompositeByteBuf (in 
module org.apache.arrow.flight.core)
 cannot access class io.netty.buffer.CompositeByteBuf (in unnamed module ...) 
because module
-org.apache.arrow.flight.core does not read unnamed module ...
+org.apache.arrow.flight.core does not read unnamed module ...``
 
 Finally, if you are using arrow-dataset, you'll also need to report that JDK 
internals need to be exposed.
 Modifying the command above for arrow-memory:
diff --git a/docs/source/java/ipc.rst b/docs/source/java/ipc.rst
index 01341ff2cc..f593917917 100644
--- a/docs/source/java/ipc.rst
+++ b/docs/source/java/ipc.rst
@@ -81,7 +81,7 @@ Here we used an in-memory stream, but this could have been a 
socket or some othe
     writer.end();
 
 Note that, since the :class:`VectorSchemaRoot` in the writer is a container 
that can hold batches, batches flow through
-:class:`VectorSchemaRoot` as part of a pipeline, so we need to populate data 
before `writeBatch`, so that later batches
+:class:`VectorSchemaRoot` as part of a pipeline, so we need to populate data 
before ``writeBatch``, so that later batches
 could overwrite previous ones.
 
 Now the :class:`ByteArrayOutputStream` contains the complete stream which 
contains 5 record batches.
diff --git a/docs/source/java/quickstartguide.rst 
b/docs/source/java/quickstartguide.rst
index a71ddc5b5e..1f3ec861d3 100644
--- a/docs/source/java/quickstartguide.rst
+++ b/docs/source/java/quickstartguide.rst
@@ -195,10 +195,10 @@ Example: Create a dataset of names (strings) and ages 
(32-bit signed integers).
 .. code-block:: shell
 
     VectorSchemaRoot created:
-    age            name
-    10     Dave
-    20     Peter
-    30     Mary
+    age      name
+    10      Dave
+    20      Peter
+    30      Mary
 
 
 Interprocess Communication (IPC)
@@ -306,10 +306,10 @@ Example: Read the dataset from the previous example from 
an Arrow IPC file (rand
 
     Record batches in file: 1
     VectorSchemaRoot read:
-    age            name
-    10     Dave
-    20     Peter
-    30     Mary
+    age      name
+    10       Dave
+    20       Peter
+    30       Mary
 
 More examples available at `Arrow Java Cookbook`_.
 
diff --git a/docs/source/java/substrait.rst b/docs/source/java/substrait.rst
index c5857dcc23..fa20dbd61d 100644
--- a/docs/source/java/substrait.rst
+++ b/docs/source/java/substrait.rst
@@ -100,9 +100,9 @@ Here is an example of a Java program that queries a Parquet 
file using Java Subs
 .. code-block:: text
 
     // Results example:
-    FieldPath(0)       FieldPath(1)    FieldPath(2)    FieldPath(3)
-    0  ALGERIA 0        haggle. carefully final deposits detect slyly agai
-    1  ARGENTINA       1       al foxes promise slyly according to the regular 
accounts. bold requests alon
+    FieldPath(0)    FieldPath(1)    FieldPath(2)    FieldPath(3)
+    0               ALGERIA         0               haggle. carefully final 
deposits detect slyly agai
+    1               ARGENTINA       1               al foxes promise slyly 
according to the regular accounts. bold requests alon
 
 Executing Projections and Filters Using Extended Expressions
 ============================================================
@@ -189,13 +189,13 @@ This Java program:
 
 .. code-block:: text
 
-    column-1   column-2
-    13 ROMANIA - ular asymptotes are about the furious multipliers. express 
dependencies nag above the ironically ironic account
-    14 SAUDI ARABIA - ts. silent requests haggle. closely express packages 
sleep across the blithely
-    12 VIETNAM - hely enticingly express accounts. even, final
-    13 RUSSIA -  requests against the platelets use never according to the 
quickly regular pint
-    13 UNITED KINGDOM - eans boost carefully special requests. accounts are. 
carefull
-    11 UNITED STATES - y final packages. slow foxes cajole quickly. quickly 
silent platelets breach ironic accounts. unusual pinto be
+    column-1  column-2
+    13        ROMANIA - ular asymptotes are about the furious multipliers. 
express dependencies nag above the ironically ironic account
+    14        SAUDI ARABIA - ts. silent requests haggle. closely express 
packages sleep across the blithely
+    12        VIETNAM - hely enticingly express accounts. even, final
+    13        RUSSIA -  requests against the platelets use never according to 
the quickly regular pint
+    13        UNITED KINGDOM - eans boost carefully special requests. accounts 
are. carefull
+    11        UNITED STATES - y final packages. slow foxes cajole quickly. 
quickly silent platelets breach ironic accounts. unusual pinto be
 
 .. _`Substrait`: https://substrait.io/
 .. _`Substrait Java`: https://github.com/substrait-io/substrait-java
diff --git a/docs/source/java/table.rst b/docs/source/java/table.rst
index 603910f516..5aa95e153c 100644
--- a/docs/source/java/table.rst
+++ b/docs/source/java/table.rst
@@ -75,7 +75,7 @@ Tables are created from a ``VectorSchemaRoot`` as shown 
below. The memory buffer
 
     Table t = new Table(someVectorSchemaRoot);
 
-If you now update the vectors held by the ``VectorSchemaRoot`` (using some 
version of  `ValueVector#setSafe()`), it would reflect those changes, but the 
values in table *t* are unchanged.
+If you now update the vectors held by the ``VectorSchemaRoot`` (using some 
version of  ``ValueVector#setSafe()``), it would reflect those changes, but the 
values in table *t* are unchanged.
 
 Creating a Table from FieldVectors
 **********************************
@@ -243,7 +243,7 @@ It is important to recognize that rows are NOT reified as 
objects, but rather op
 Getting a row
 *************
 
-Calling `immutableRow()` on any table instance returns a new ``Row`` instance.
+Calling ``immutableRow()`` on any table instance returns a new ``Row`` 
instance.
 
 .. code-block:: Java
 
@@ -262,7 +262,7 @@ Since rows are iterable, you can traverse a table using a 
standard while loop:
       // do something useful here
     }
 
-``Table`` implements `Iterable<Row>` so you can access rows directly from a 
table in an enhanced *for* loop:
+``Table`` implements ``Iterable<Row>`` so you can access rows directly from a 
table in an enhanced *for* loop:
 
 .. code-block:: Java
 
@@ -272,7 +272,7 @@ Since rows are iterable, you can traverse a table using a 
standard while loop:
       ...
     }
 
-Finally, while rows are usually iterated in the order of the underlying data 
vectors, but they are also positionable using the `Row#setPosition()` method, 
so you can skip to a specific row. Row numbers are 0-based.
+Finally, while rows are usually iterated in the order of the underlying data 
vectors, but they are also positionable using the ``Row#setPosition()`` method, 
so you can skip to a specific row. Row numbers are 0-based.
 
 .. code-block:: Java
 
@@ -281,7 +281,7 @@ Finally, while rows are usually iterated in the order of 
the underlying data vec
 
 Any changes to position are applied to all the columns in the table.
 
-Note that you must call `next()`, or `setPosition()` before accessing values 
via a row. Failure to do so results in a runtime exception.
+Note that you must call ``next()``, or ``setPosition()`` before accessing 
values via a row. Failure to do so results in a runtime exception.
 
 Read operations using rows
 **************************
@@ -304,7 +304,7 @@ You can also get value using a nullable ``ValueHolder``. 
For example:
 
 This can be used to retrieve values without creating a new Object for each.
 
-In addition to getting values, you can check if a value is null using 
`isNull()`. This is important if the vector contains any nulls, as asking for a 
value from a vector can cause NullPointerExceptions in some cases.
+In addition to getting values, you can check if a value is null using 
``isNull()``. This is important if the vector contains any nulls, as asking for 
a value from a vector can cause NullPointerExceptions in some cases.
 
 .. code-block:: Java
 
@@ -352,13 +352,13 @@ Working with the C-Data interface
 
 The ability to work with native code is required for many Arrow features. This 
section describes how tables can be be exported for use with native code
 
-Exporting works by converting the data to a ``VectorSchemaRoot`` instance and 
using the existing facilities to transfer the data. You could do it yourself, 
but that isn't ideal because conversion to a vector schema root breaks the 
immutability guarantees. Using the `exportTable()` methods in the `Data`_ class 
avoids this concern.
+Exporting works by converting the data to a ``VectorSchemaRoot`` instance and 
using the existing facilities to transfer the data. You could do it yourself, 
but that isn't ideal because conversion to a vector schema root breaks the 
immutability guarantees. Using the ``exportTable()`` methods in the `Data`_ 
class avoids this concern.
 
 .. code-block:: Java
 
     Data.exportTable(bufferAllocator, table, dictionaryProvider, 
outArrowArray);
 
-If the table contains dictionary-encoded vectors and was constructed with a 
``DictionaryProvider``, the provider argument to `exportTable()` can be omitted 
and the table's provider attribute will be used:
+If the table contains dictionary-encoded vectors and was constructed with a 
``DictionaryProvider``, the provider argument to ``exportTable()`` can be 
omitted and the table's provider attribute will be used:
 
 .. code-block:: Java
 
diff --git a/docs/source/python/api/compute.rst 
b/docs/source/python/api/compute.rst
index f2ac6bd1e1..5423eebfba 100644
--- a/docs/source/python/api/compute.rst
+++ b/docs/source/python/api/compute.rst
@@ -173,7 +173,7 @@ variants which detect domain errors where appropriate.
 Comparisons
 -----------
 
-These functions expect two inputs of the same type. If one of the inputs is 
`null`
+These functions expect two inputs of the same type. If one of the inputs is 
``null``
 they return ``null``.
 
 .. autosummary::
diff --git a/docs/source/python/data.rst b/docs/source/python/data.rst
index 9156157fcd..f17475138c 100644
--- a/docs/source/python/data.rst
+++ b/docs/source/python/data.rst
@@ -76,7 +76,7 @@ We use the name **logical type** because the **physical** 
storage may be the
 same for one or more types. For example, ``int64``, ``float64``, and
 ``timestamp[ms]`` all occupy 64 bits per value.
 
-These objects are `metadata`; they are used for describing the data in arrays,
+These objects are ``metadata``; they are used for describing the data in 
arrays,
 schemas, and record batches. In Python, they can be used in functions where the
 input data (e.g. Python objects) may be coerced to more than one Arrow type.
 
@@ -99,7 +99,7 @@ types' children. For example, we can define a list of int32 
values with:
    t6 = pa.list_(t1)
    t6
 
-A `struct` is a collection of named fields:
+A ``struct`` is a collection of named fields:
 
 .. ipython:: python
 
diff --git a/docs/source/python/extending_types.rst 
b/docs/source/python/extending_types.rst
index 8df0ef0b1f..83fce84f47 100644
--- a/docs/source/python/extending_types.rst
+++ b/docs/source/python/extending_types.rst
@@ -101,7 +101,7 @@ define the ``__arrow_array__`` method to return an Arrow 
array::
             import pyarrow
             return pyarrow.array(..., type=type)
 
-The ``__arrow_array__`` method takes an optional `type` keyword which is passed
+The ``__arrow_array__`` method takes an optional ``type`` keyword which is 
passed
 through from :func:`pyarrow.array`. The method is allowed to return either
 a :class:`~pyarrow.Array` or a :class:`~pyarrow.ChunkedArray`.
 
diff --git a/docs/source/python/filesystems.rst 
b/docs/source/python/filesystems.rst
index 22f983a60c..23d10aaaad 100644
--- a/docs/source/python/filesystems.rst
+++ b/docs/source/python/filesystems.rst
@@ -182,7 +182,7 @@ Example how you can read contents from a S3 bucket::
 
 
 Note that it is important to configure :class:`S3FileSystem` with the correct
-region for the bucket being used. If `region` is not set, the AWS SDK will
+region for the bucket being used. If ``region`` is not set, the AWS SDK will
 choose a value, defaulting to 'us-east-1' if the SDK version is <1.8.
 Otherwise it will try to use a variety of heuristics (environment variables,
 configuration profile, EC2 metadata server) to resolve the region.
@@ -277,7 +277,7 @@ load time, since the library may not be in your 
LD_LIBRARY_PATH), and relies on
 some environment variables.
 
 * ``HADOOP_HOME``: the root of your installed Hadoop distribution. Often has
-  `lib/native/libhdfs.so`.
+  ``lib/native/libhdfs.so``.
 
 * ``JAVA_HOME``: the location of your Java SDK installation.
 
diff --git a/docs/source/python/install.rst b/docs/source/python/install.rst
index 4b966e6d26..12555c9306 100644
--- a/docs/source/python/install.rst
+++ b/docs/source/python/install.rst
@@ -83,7 +83,7 @@ While Arrow uses the OS-provided timezone database on Linux 
and macOS, it requir
 user-provided database on Windows. To download and extract the text version of
 the IANA timezone database follow the instructions in the C++
 :ref:`download-timezone-database` or use pyarrow utility function
-`pyarrow.util.download_tzdata_on_windows()` that does the same.
+``pyarrow.util.download_tzdata_on_windows()`` that does the same.
 
 By default, the timezone database will be detected at 
``%USERPROFILE%\Downloads\tzdata``.
 If the database has been downloaded in a different location, you will need to 
set
diff --git a/docs/source/python/integration/extending.rst 
b/docs/source/python/integration/extending.rst
index b380fea7e9..d4d099bcf4 100644
--- a/docs/source/python/integration/extending.rst
+++ b/docs/source/python/integration/extending.rst
@@ -474,7 +474,7 @@ Toolchain Compatibility (Linux)
 
 The Python wheels for Linux are built using the
 `PyPA manylinux images <https://quay.io/organization/pypa>`_ which use
-the CentOS `devtoolset-9`. In addition to the other notes
+the CentOS ``devtoolset-9``. In addition to the other notes
 above, if you are compiling C++ using these shared libraries, you will need
 to make sure you use a compatible toolchain as well or you might see a
 segfault during runtime.
diff --git a/docs/source/python/memory.rst b/docs/source/python/memory.rst
index 23474b9237..7b49d48ab2 100644
--- a/docs/source/python/memory.rst
+++ b/docs/source/python/memory.rst
@@ -46,7 +46,7 @@ parent-child relationships.
 
 There are many implementations of ``arrow::Buffer``, but they all provide a
 standard interface: a data pointer and length. This is similar to Python's
-built-in `buffer protocol` and ``memoryview`` objects.
+built-in ``buffer protocol`` and ``memoryview`` objects.
 
 A :class:`Buffer` can be created from any Python object implementing
 the buffer protocol by calling the :func:`py_buffer` function. Let's consider
diff --git a/docs/source/python/timestamps.rst 
b/docs/source/python/timestamps.rst
index cecbd5b595..80a1b7280c 100644
--- a/docs/source/python/timestamps.rst
+++ b/docs/source/python/timestamps.rst
@@ -24,7 +24,7 @@ Arrow/Pandas Timestamps
 
 Arrow timestamps are stored as a 64-bit integer with column metadata to
 associate a time unit (e.g. milliseconds, microseconds, or nanoseconds), and an
-optional time zone.  Pandas (`Timestamp`) uses a 64-bit integer representing
+optional time zone.  Pandas (``Timestamp``) uses a 64-bit integer representing
 nanoseconds and an optional time zone.
 Python/Pandas timestamp types without a associated time zone are referred to as
 "Time Zone Naive".  Python/Pandas timestamp types with an associated time zone 
are

(arrow) branch main updated: GH-41611: [Docs][CI] Enable most sphinx-lint rules for documentation (#41612)

Reply via email to