This is an automated email from the ASF dual-hosted git repository.
alenka pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 07a30d9a57 GH-41611: [Docs][CI] Enable most sphinx-lint rules for
documentation (#41612)
07a30d9a57 is described below
commit 07a30d9a5784852187d100660325b8c12b4ff6c8
Author: Bryce Mecum <[email protected]>
AuthorDate: Thu May 16 03:30:14 2024 -0800
GH-41611: [Docs][CI] Enable most sphinx-lint rules for documentation
(#41612)
### Rationale for this change
https://github.com/apache/arrow/issues/41611
### What changes are included in this PR?
- Update to pre-commit config to enable all checks except
`dangling-hyphen`, `line-too-long` by default
- Associated fix docs
### Are these changes tested?
Yes, by building and looking at the docs locally.
### Are there any user-facing changes?
Just docs.
* GitHub Issue: #41611
Authored-by: Bryce Mecum <[email protected]>
Signed-off-by: AlenkaF <[email protected]>
---
.pre-commit-config.yaml | 10 +++++++--
docs/source/conf.py | 2 +-
docs/source/cpp/acero/developer_guide.rst | 10 ++++-----
docs/source/cpp/acero/overview.rst | 26 +++++++++++-----------
docs/source/cpp/acero/user_guide.rst | 8 +++----
docs/source/cpp/build_system.rst | 2 +-
docs/source/cpp/compute.rst | 18 +++++++--------
docs/source/developers/cpp/building.rst | 2 +-
docs/source/developers/documentation.rst | 2 +-
.../guide/step_by_step/arrow_codebase.rst | 4 ++--
.../developers/guide/step_by_step/set_up.rst | 8 +++----
docs/source/developers/java/development.rst | 2 +-
docs/source/developers/release.rst | 4 ++--
docs/source/format/CanonicalExtensions.rst | 4 ++--
docs/source/format/Columnar.rst | 6 ++---
docs/source/format/FlightSql.rst | 2 +-
docs/source/format/Integration.rst | 2 +-
docs/source/java/algorithm.rst | 2 +-
docs/source/java/flight_sql_jdbc_driver.rst | 2 +-
docs/source/java/install.rst | 2 +-
docs/source/java/ipc.rst | 2 +-
docs/source/java/quickstartguide.rst | 16 ++++++-------
docs/source/java/substrait.rst | 20 ++++++++---------
docs/source/java/table.rst | 16 ++++++-------
docs/source/python/api/compute.rst | 2 +-
docs/source/python/data.rst | 4 ++--
docs/source/python/extending_types.rst | 2 +-
docs/source/python/filesystems.rst | 4 ++--
docs/source/python/install.rst | 2 +-
docs/source/python/integration/extending.rst | 2 +-
docs/source/python/memory.rst | 2 +-
docs/source/python/timestamps.rst | 2 +-
32 files changed, 99 insertions(+), 93 deletions(-)
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index bf5ca08d53..7dcc1c9816 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -136,5 +136,11 @@ repos:
rev: v0.9.1
hooks:
- id: sphinx-lint
- files: ^docs/
- args: ['--disable', 'all', '--enable',
'trailing-whitespace,missing-final-newline', 'docs']
+ files: ^docs/source
+ exclude: ^docs/source/python/generated
+ args: [
+ '--enable',
+ 'all',
+ '--disable',
+ 'dangling-hyphen,line-too-long',
+ ]
diff --git a/docs/source/conf.py b/docs/source/conf.py
index b487200555..1e6c113e33 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -535,7 +535,7 @@ latex_documents = [
#
# latex_appendices = []
-# It false, will not define \strong, \code, itleref, \crossref ... but only
+# It false, will not define \strong, \code, \titleref, \crossref ... but only
# \sphinxstrong, ..., \sphinxtitleref, ... To help avoid clash with user added
# packages.
#
diff --git a/docs/source/cpp/acero/developer_guide.rst
b/docs/source/cpp/acero/developer_guide.rst
index 80ca68556f..7dd08fe3ce 100644
--- a/docs/source/cpp/acero/developer_guide.rst
+++ b/docs/source/cpp/acero/developer_guide.rst
@@ -327,8 +327,8 @@ An engine could choose to create a thread task for every
execution of a node. H
this leads to problems with cache locality. For example, let's assume we have
a basic plan consisting of three
exec nodes, scan, project, and then filter (this is a very common use case).
Now let's assume there are 100 batches.
In a task-per-operator model we would have tasks like "Scan Batch 5", "Project
Batch 5", and "Filter Batch 5". Each
-of those tasks is potentially going to access the same data. For example,
maybe the `project` and `filter` nodes need
-to read the same column. A column which is intially created in a decode phase
of the `scan` node. To maximize cache
+of those tasks is potentially going to access the same data. For example,
maybe the ``project`` and ``filter`` nodes need
+to read the same column. A column which is intially created in a decode phase
of the ``scan`` node. To maximize cache
utilization we would need to carefully schedule our tasks to ensure that all
three of those tasks are run consecutively
and assigned to the same CPU core.
@@ -412,7 +412,7 @@ Ordered Execution
=================
Some nodes either establish an ordering to their outgoing batches or they need
to be able to process batches in order.
-Acero handles ordering using the `batch_index` property on an ExecBatch. If a
node has a deterministic output order
+Acero handles ordering using the ``batch_index`` property on an ExecBatch. If
a node has a deterministic output order
then it should apply a batch index on batches that it emits. For example, the
OrderByNode applies a new ordering to
batches (regardless of the incoming ordering). The scan node is able to
attach an implicit ordering to batches which
reflects the order of the rows in the files being scanned.
@@ -461,8 +461,8 @@ Acero's tracing is currently half-implemented and there are
major gaps in profil
effort at tracing with open telemetry and most of the necessary pieces are in
place. The main thing currently lacking is
some kind of effective visualization of the tracing results.
-In order to use the tracing that is present today you will need to build with
Arrow with `ARROW_WITH_OPENTELEMETRY=ON`.
-Then you will need to set the environment variable
`ARROW_TRACING_BACKEND=otlp_http`. This will configure open telemetry
+In order to use the tracing that is present today you will need to build with
Arrow with ``ARROW_WITH_OPENTELEMETRY=ON``.
+Then you will need to set the environment variable
``ARROW_TRACING_BACKEND=otlp_http``. This will configure open telemetry
to export trace results (as OTLP) to the HTTP endpoint
http://localhost:4318/v1/traces. You will need to configure an
open telemetry collector to collect results on that endpoint and you will need
to configure a trace viewer of some kind
such as Jaeger: https://www.jaegertracing.io/docs/1.21/opentelemetry/
diff --git a/docs/source/cpp/acero/overview.rst
b/docs/source/cpp/acero/overview.rst
index 8be4cbc1b1..34e0b143bc 100644
--- a/docs/source/cpp/acero/overview.rst
+++ b/docs/source/cpp/acero/overview.rst
@@ -209,16 +209,16 @@ must have the same length. There are a few key
differences from ExecBatch:
Both the record batch and the exec batch have strong ownership of the
arrays & buffers
-* An `ExecBatch` does not have a schema. This is because an `ExecBatch` is
assumed to be
+* An ``ExecBatch`` does not have a schema. This is because an ``ExecBatch``
is assumed to be
part of a stream of batches and the stream is assumed to have a consistent
schema. So
- the schema for an `ExecBatch` is typically stored in the ExecNode.
-* Columns in an `ExecBatch` are either an `Array` or a `Scalar`. When a
column is a `Scalar`
- this means that the column has a single value for every row in the batch.
An `ExecBatch`
+ the schema for an ``ExecBatch`` is typically stored in the ExecNode.
+* Columns in an ``ExecBatch`` are either an ``Array`` or a ``Scalar``. When a
column is a ``Scalar``
+ this means that the column has a single value for every row in the batch.
An ``ExecBatch``
also has a length property which describes how many rows are in a batch. So
another way to
- view a `Scalar` is a constant array with `length` elements.
-* An `ExecBatch` contains additional information used by the exec plan. For
example, an
- `index` can be used to describe a batch's position in an ordered stream. We
expect
- that `ExecBatch` will also evolve to contain additional fields such as a
selection vector.
+ view a ``Scalar`` is a constant array with ``length`` elements.
+* An ``ExecBatch`` contains additional information used by the exec plan. For
example, an
+ ``index`` can be used to describe a batch's position in an ordered stream.
We expect
+ that ``ExecBatch`` will also evolve to contain additional fields such as a
selection vector.
.. figure:: scalar_vs_array.svg
@@ -231,8 +231,8 @@ only zero copy if there are no scalars in the exec batch.
.. note::
Both Acero and the compute module have "lightweight" versions of batches
and arrays.
- In the compute module these are called `BatchSpan`, `ArraySpan`, and
`BufferSpan`. In
- Acero the concept is called `KeyColumnArray`. These types were developed
concurrently
+ In the compute module these are called ``BatchSpan``, ``ArraySpan``, and
``BufferSpan``. In
+ Acero the concept is called ``KeyColumnArray``. These types were developed
concurrently
and serve the same purpose. They aim to provide an array container that
can be completely
stack allocated (provided the data type is non-nested) in order to avoid
heap allocation
overhead. Ideally these two concepts will be merged someday.
@@ -247,9 +247,9 @@ execution of the nodes. Both ExecPlan and ExecNode are
tied to the lifecycle of
They have state and are not expected to be restartable.
.. warning::
- The structures within Acero, including `ExecBatch`, are still experimental.
The `ExecBatch`
- class should not be used outside of Acero. Instead, an `ExecBatch` should
be converted to
- a more standard structure such as a `RecordBatch`.
+ The structures within Acero, including ``ExecBatch``, are still
experimental. The ``ExecBatch``
+ class should not be used outside of Acero. Instead, an ``ExecBatch``
should be converted to
+ a more standard structure such as a ``RecordBatch``.
Similarly, an ExecPlan is an internal concept. Users creating plans should
be using Declaration
objects. APIs for consuming and executing plans should abstract away the
details of the underlying
diff --git a/docs/source/cpp/acero/user_guide.rst
b/docs/source/cpp/acero/user_guide.rst
index adcc17216e..0271be2180 100644
--- a/docs/source/cpp/acero/user_guide.rst
+++ b/docs/source/cpp/acero/user_guide.rst
@@ -455,8 +455,8 @@ can be selected from :ref:`this list of aggregation
functions
will be added which should alleviate this constraint.
The aggregation can provide results as a group or scalar. For instances,
-an operation like `hash_count` provides the counts per each unique record
-as a grouped result while an operation like `sum` provides a single record.
+an operation like ``hash_count`` provides the counts per each unique record
+as a grouped result while an operation like ``sum`` provides a single record.
Scalar Aggregation example:
@@ -490,7 +490,7 @@ caller will repeatedly call this function until the
generator function is exhaus
will accumulate in memory. An execution plan should only have one
"terminal" node (one sink node). An :class:`ExecPlan` can terminate early due
to cancellation or
an error, before the output is fully consumed. However, the plan can be safely
destroyed independently
-of the sink, which will hold the unconsumed batches by `exec_plan->finished()`.
+of the sink, which will hold the unconsumed batches by
``exec_plan->finished()``.
As a part of the Source Example, the Sink operation is also included;
@@ -515,7 +515,7 @@ The consuming function may be called before a previous
invocation has completed.
function does not run quickly enough then many concurrent executions could
pile up, blocking the
CPU thread pool. The execution plan will not be marked finished until all
consuming function callbacks
have been completed.
-Once all batches have been delivered the execution plan will wait for the
`finish` future to complete
+Once all batches have been delivered the execution plan will wait for the
``finish`` future to complete
before marking the execution plan finished. This allows for workflows where
the consumption function
converts batches into async tasks (this is currently done internally for the
dataset write node).
diff --git a/docs/source/cpp/build_system.rst b/docs/source/cpp/build_system.rst
index 0c94d7e5ce..e80bca4c94 100644
--- a/docs/source/cpp/build_system.rst
+++ b/docs/source/cpp/build_system.rst
@@ -167,7 +167,7 @@ file into an executable linked with the Arrow C++ shared
library:
.. code-block:: makefile
my_example: my_example.cc
- $(CXX) -o $@ $(CXXFLAGS) $< $$(pkg-config --cflags --libs arrow)
+ $(CXX) -o $@ $(CXXFLAGS) $< $$(pkg-config --cflags --libs arrow)
Many build systems support pkg-config. For example:
diff --git a/docs/source/cpp/compute.rst b/docs/source/cpp/compute.rst
index 546b6e5716..701c7d573a 100644
--- a/docs/source/cpp/compute.rst
+++ b/docs/source/cpp/compute.rst
@@ -514,8 +514,8 @@ Mixed time resolution temporal inputs will be cast to
finest input resolution.
+------------+---------------------------------------------+
It's compatible with Redshift's decimal promotion rules. All decimal digits
- are preserved for `add`, `subtract` and `multiply` operations. The result
- precision of `divide` is at least the sum of precisions of both operands with
+ are preserved for ``add``, ``subtract`` and ``multiply`` operations. The
result
+ precision of ``divide`` is at least the sum of precisions of both operands
with
enough scale kept. Error is returned if the result precision is beyond the
decimal value range.
@@ -1029,7 +1029,7 @@ These functions trim off characters on both sides (trim),
or the left (ltrim) or
+--------------------------+------------+-------------------------+---------------------+----------------------------------------+---------+
* \(1) Only characters specified in :member:`TrimOptions::characters` will be
- trimmed off. Both the input string and the `characters` argument are
+ trimmed off. Both the input string and the ``characters`` argument are
interpreted as ASCII characters.
* \(2) Only trim off ASCII whitespace characters (``'\t'``, ``'\n'``, ``'\v'``,
@@ -1570,7 +1570,7 @@ is the same, even though the UTC years would be different.
Timezone handling
~~~~~~~~~~~~~~~~~
-`assume_timezone` function is meant to be used when an external system produces
+``assume_timezone`` function is meant to be used when an external system
produces
"timezone-naive" timestamps which need to be converted to "timezone-aware"
timestamps (see for example the `definition
<https://docs.python.org/3/library/datetime.html#aware-and-naive-objects>`__
@@ -1581,11 +1581,11 @@ Input timestamps are assumed to be relative to the
timezone given in
UTC-relative timestamps with the timezone metadata set to the above value.
An error is returned if the timestamps already have the timezone metadata set.
-`local_timestamp` function converts UTC-relative timestamps to local
"timezone-naive"
+``local_timestamp`` function converts UTC-relative timestamps to local
"timezone-naive"
timestamps. The timezone is taken from the timezone metadata of the input
-timestamps. This function is the inverse of `assume_timezone`. Please note:
+timestamps. This function is the inverse of ``assume_timezone``. Please note:
**all temporal functions already operate on timestamps as if they were in local
-time of the metadata provided timezone**. Using `local_timestamp` is only
meant to be
+time of the metadata provided timezone**. Using ``local_timestamp`` is only
meant to be
used when an external system expects local timestamps.
+-----------------+-------+-------------+---------------+---------------------------------+-------+
@@ -1649,8 +1649,8 @@ overflow is detected.
* \(1) CumulativeOptions has two optional parameters. The first parameter
:member:`CumulativeOptions::start` is a starting value for the running
- accumulation. It has a default value of 0 for `sum`, 1 for `prod`, min of
- input type for `max`, and max of input type for `min`. Specified values of
+ accumulation. It has a default value of 0 for ``sum``, 1 for ``prod``, min of
+ input type for ``max``, and max of input type for ``min``. Specified values
of
``start`` must be castable to the input type. The second parameter
:member:`CumulativeOptions::skip_nulls` is a boolean. When set to
false (the default), the first encountered null is propagated. When set to
diff --git a/docs/source/developers/cpp/building.rst
b/docs/source/developers/cpp/building.rst
index 7b80d2138c..b052b856c9 100644
--- a/docs/source/developers/cpp/building.rst
+++ b/docs/source/developers/cpp/building.rst
@@ -312,7 +312,7 @@ depends on ``python`` being available).
On some Linux distributions, running the test suite might require setting an
explicit locale. If you see any locale-related errors, try setting the
-environment variable (which requires the `locales` package or equivalent):
+environment variable (which requires the ``locales`` package or equivalent):
.. code-block::
diff --git a/docs/source/developers/documentation.rst
b/docs/source/developers/documentation.rst
index 8b1ea28c0f..a479065f62 100644
--- a/docs/source/developers/documentation.rst
+++ b/docs/source/developers/documentation.rst
@@ -259,7 +259,7 @@ Build the docs in the target directory:
sphinx-build ./source/developers ./source/developers/_build -c ./source -D
master_doc=temp_index
This builds everything in the target directory to a folder inside of it
-called ``_build`` using the config file in the `source` directory.
+called ``_build`` using the config file in the ``source`` directory.
Once you have verified the HTML documents, you can remove temporary index file:
diff --git a/docs/source/developers/guide/step_by_step/arrow_codebase.rst
b/docs/source/developers/guide/step_by_step/arrow_codebase.rst
index 0beece991b..0c194ab3a3 100644
--- a/docs/source/developers/guide/step_by_step/arrow_codebase.rst
+++ b/docs/source/developers/guide/step_by_step/arrow_codebase.rst
@@ -99,8 +99,8 @@ can be called from a function in another language. After a
function is defined
C++ we must create the binding manually to use it in that implementation.
.. note::
- There is much you can learn by checking **Pull Requests**
- and **unit tests** for similar issues.
+ There is much you can learn by checking **Pull Requests**
+ and **unit tests** for similar issues.
.. tab-set::
diff --git a/docs/source/developers/guide/step_by_step/set_up.rst
b/docs/source/developers/guide/step_by_step/set_up.rst
index 9a2177568d..9c808ceee7 100644
--- a/docs/source/developers/guide/step_by_step/set_up.rst
+++ b/docs/source/developers/guide/step_by_step/set_up.rst
@@ -118,10 +118,10 @@ Should give you a result similar to this:
.. code:: console
- origin https://github.com/<your username>/arrow.git (fetch)
- origin https://github.com/<your username>/arrow.git (push)
- upstream https://github.com/apache/arrow (fetch)
- upstream https://github.com/apache/arrow (push)
+ origin https://github.com/<your username>/arrow.git (fetch)
+ origin https://github.com/<your username>/arrow.git (push)
+ upstream https://github.com/apache/arrow (fetch)
+ upstream https://github.com/apache/arrow (push)
If you did everything correctly, you should now have a copy of the code
in the ``arrow`` directory and two remotes that refer to your own GitHub
diff --git a/docs/source/developers/java/development.rst
b/docs/source/developers/java/development.rst
index 17d47c324c..3f0ff6cdd0 100644
--- a/docs/source/developers/java/development.rst
+++ b/docs/source/developers/java/development.rst
@@ -118,7 +118,7 @@ This checks the code style of all source code under the
current directory or fro
$ mvn checkstyle:check
-Maven `pom.xml` style is enforced with Spotless using `Apache Maven pom.xml
guidelines`_
+Maven ``pom.xml`` style is enforced with Spotless using `Apache Maven pom.xml
guidelines`_
You can also just check the style without building the project.
This checks the style of all pom.xml files under the current directory or from
within an individual module.
diff --git a/docs/source/developers/release.rst
b/docs/source/developers/release.rst
index 0b3a83dc5a..d903cc71bd 100644
--- a/docs/source/developers/release.rst
+++ b/docs/source/developers/release.rst
@@ -106,7 +106,7 @@ If there is consensus and there is a Release Manager
willing to take the effort
the release a patch release can be created.
Committers can tag issues that should be included on the next patch release
using the
-`backport-candidate` label. Is the responsability of the author or the
committer to add the
+``backport-candidate`` label. Is the responsability of the author or the
committer to add the
label to the issue to help the Release Manager identify the issues that should
be backported.
If a specific issue is identified as the reason to create a patch release the
Release Manager
@@ -117,7 +117,7 @@ Be sure to go through on the following checklist:
#. Create milestone
#. Create maintenance branch
#. Include issue that was requested as requiring new patch release
-#. Add new milestone to issues with `backport-candidate` label
+#. Add new milestone to issues with ``backport-candidate`` label
#. cherry-pick issues into maintenance branch
Creating a Release Candidate
diff --git a/docs/source/format/CanonicalExtensions.rst
b/docs/source/format/CanonicalExtensions.rst
index c60f095dd3..c258f889dc 100644
--- a/docs/source/format/CanonicalExtensions.rst
+++ b/docs/source/format/CanonicalExtensions.rst
@@ -77,7 +77,7 @@ Official List
Fixed shape tensor
==================
-* Extension name: `arrow.fixed_shape_tensor`.
+* Extension name: ``arrow.fixed_shape_tensor``.
* The storage type of the extension: ``FixedSizeList`` where:
@@ -153,7 +153,7 @@ Fixed shape tensor
Variable shape tensor
=====================
-* Extension name: `arrow.variable_shape_tensor`.
+* Extension name: ``arrow.variable_shape_tensor``.
* The storage type of the extension is: ``StructArray`` where struct
is composed of **data** and **shape** fields describing a single
diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst
index ec6a7fa5e3..7c853de782 100644
--- a/docs/source/format/Columnar.rst
+++ b/docs/source/format/Columnar.rst
@@ -312,7 +312,7 @@ Each value in this layout consists of 0 or more bytes.
While primitive
arrays have a single values buffer, variable-size binary have an
**offsets** buffer and **data** buffer.
-The offsets buffer contains `length + 1` signed integers (either
+The offsets buffer contains ``length + 1`` signed integers (either
32-bit or 64-bit, depending on the logical type), which encode the
start position of each slot in the data buffer. The length of the
value in each slot is computed using the difference between the offset
@@ -374,7 +374,7 @@ locations are indicated using a **views** buffer, which may
point to one
of potentially several **data** buffers or may contain the characters
inline.
-The views buffer contains `length` view structures with the following layout:
+The views buffer contains ``length`` view structures with the following layout:
::
@@ -394,7 +394,7 @@ should be interpreted.
In the short string case the string's bytes are inlined — stored inside the
view itself, in the twelve bytes which follow the length. Any remaining bytes
-after the string itself are padded with `0`.
+after the string itself are padded with ``0``.
In the long string case, a buffer index indicates which data buffer
stores the data bytes and an offset indicates where in that buffer the
diff --git a/docs/source/format/FlightSql.rst b/docs/source/format/FlightSql.rst
index 9c3523755f..b4b85e77a2 100644
--- a/docs/source/format/FlightSql.rst
+++ b/docs/source/format/FlightSql.rst
@@ -193,7 +193,7 @@ in the ``app_metadata`` field of the Flight RPC
``PutResult`` returned.
When used with DoPut: load the stream of Arrow record batches into
the specified target table and return the number of rows ingested
- via a `DoPutUpdateResult` message.
+ via a ``DoPutUpdateResult`` message.
Flight Server Session Management
--------------------------------
diff --git a/docs/source/format/Integration.rst
b/docs/source/format/Integration.rst
index c800255687..436747989a 100644
--- a/docs/source/format/Integration.rst
+++ b/docs/source/format/Integration.rst
@@ -501,7 +501,7 @@ integration testing actually tests.
There are two types of integration test cases: the ones populated on the fly
by the data generator in the Archery utility, and *gold* files that exist
-in the `arrow-testing
<https://github.com/apache/arrow-testing/tree/master/data/arrow-ipc-stream/integration>`
+in the `arrow-testing
<https://github.com/apache/arrow-testing/tree/master/data/arrow-ipc-stream/integration>`_
repository.
Data Generator Tests
diff --git a/docs/source/java/algorithm.rst b/docs/source/java/algorithm.rst
index 06ed32bd48..d4838967d6 100644
--- a/docs/source/java/algorithm.rst
+++ b/docs/source/java/algorithm.rst
@@ -82,7 +82,7 @@ for fixed width and variable width vectors, respectively.
Both algorithms run in
3. **Index sorter**: this sorter does not actually sort the vector. Instead,
it returns an integer
vector, which correspond to indices of vector elements in sorted order. With
the index vector, one can
-easily construct a sorted vector. In addition, some other tasks can be easily
achieved, like finding the ``k``th
+easily construct a sorted vector. In addition, some other tasks can be easily
achieved, like finding the ``k`` th
smallest value in the vector. Index sorting is supported by
``org.apache.arrow.algorithm.sort.IndexSorter``,
which runs in ``O(nlog(n))`` time. It is applicable to vectors of any type.
diff --git a/docs/source/java/flight_sql_jdbc_driver.rst
b/docs/source/java/flight_sql_jdbc_driver.rst
index cc8822247b..f95c2ac755 100644
--- a/docs/source/java/flight_sql_jdbc_driver.rst
+++ b/docs/source/java/flight_sql_jdbc_driver.rst
@@ -162,7 +162,7 @@ the Flight SQL service as gRPC headers. For example, the
following URI ::
This will connect without authentication or encryption, to a Flight
SQL service running on ``localhost`` on port 12345. Each request will
-also include a `database=mydb` gRPC header.
+also include a ``database=mydb`` gRPC header.
Connection parameters may also be supplied using the Properties object
when using the JDBC Driver Manager to connect. When supplying using
diff --git a/docs/source/java/install.rst b/docs/source/java/install.rst
index a551edc36c..dc6a55c87f 100644
--- a/docs/source/java/install.rst
+++ b/docs/source/java/install.rst
@@ -63,7 +63,7 @@ Modifying the command above for Flight:
Otherwise, you may see errors like ``java.lang.IllegalAccessError: superclass
access check failed: class
org.apache.arrow.flight.ArrowMessage$ArrowBufRetainingCompositeByteBuf (in
module org.apache.arrow.flight.core)
cannot access class io.netty.buffer.CompositeByteBuf (in unnamed module ...)
because module
-org.apache.arrow.flight.core does not read unnamed module ...
+org.apache.arrow.flight.core does not read unnamed module ...``
Finally, if you are using arrow-dataset, you'll also need to report that JDK
internals need to be exposed.
Modifying the command above for arrow-memory:
diff --git a/docs/source/java/ipc.rst b/docs/source/java/ipc.rst
index 01341ff2cc..f593917917 100644
--- a/docs/source/java/ipc.rst
+++ b/docs/source/java/ipc.rst
@@ -81,7 +81,7 @@ Here we used an in-memory stream, but this could have been a
socket or some othe
writer.end();
Note that, since the :class:`VectorSchemaRoot` in the writer is a container
that can hold batches, batches flow through
-:class:`VectorSchemaRoot` as part of a pipeline, so we need to populate data
before `writeBatch`, so that later batches
+:class:`VectorSchemaRoot` as part of a pipeline, so we need to populate data
before ``writeBatch``, so that later batches
could overwrite previous ones.
Now the :class:`ByteArrayOutputStream` contains the complete stream which
contains 5 record batches.
diff --git a/docs/source/java/quickstartguide.rst
b/docs/source/java/quickstartguide.rst
index a71ddc5b5e..1f3ec861d3 100644
--- a/docs/source/java/quickstartguide.rst
+++ b/docs/source/java/quickstartguide.rst
@@ -195,10 +195,10 @@ Example: Create a dataset of names (strings) and ages
(32-bit signed integers).
.. code-block:: shell
VectorSchemaRoot created:
- age name
- 10 Dave
- 20 Peter
- 30 Mary
+ age name
+ 10 Dave
+ 20 Peter
+ 30 Mary
Interprocess Communication (IPC)
@@ -306,10 +306,10 @@ Example: Read the dataset from the previous example from
an Arrow IPC file (rand
Record batches in file: 1
VectorSchemaRoot read:
- age name
- 10 Dave
- 20 Peter
- 30 Mary
+ age name
+ 10 Dave
+ 20 Peter
+ 30 Mary
More examples available at `Arrow Java Cookbook`_.
diff --git a/docs/source/java/substrait.rst b/docs/source/java/substrait.rst
index c5857dcc23..fa20dbd61d 100644
--- a/docs/source/java/substrait.rst
+++ b/docs/source/java/substrait.rst
@@ -100,9 +100,9 @@ Here is an example of a Java program that queries a Parquet
file using Java Subs
.. code-block:: text
// Results example:
- FieldPath(0) FieldPath(1) FieldPath(2) FieldPath(3)
- 0 ALGERIA 0 haggle. carefully final deposits detect slyly agai
- 1 ARGENTINA 1 al foxes promise slyly according to the regular
accounts. bold requests alon
+ FieldPath(0) FieldPath(1) FieldPath(2) FieldPath(3)
+ 0 ALGERIA 0 haggle. carefully final
deposits detect slyly agai
+ 1 ARGENTINA 1 al foxes promise slyly
according to the regular accounts. bold requests alon
Executing Projections and Filters Using Extended Expressions
============================================================
@@ -189,13 +189,13 @@ This Java program:
.. code-block:: text
- column-1 column-2
- 13 ROMANIA - ular asymptotes are about the furious multipliers. express
dependencies nag above the ironically ironic account
- 14 SAUDI ARABIA - ts. silent requests haggle. closely express packages
sleep across the blithely
- 12 VIETNAM - hely enticingly express accounts. even, final
- 13 RUSSIA - requests against the platelets use never according to the
quickly regular pint
- 13 UNITED KINGDOM - eans boost carefully special requests. accounts are.
carefull
- 11 UNITED STATES - y final packages. slow foxes cajole quickly. quickly
silent platelets breach ironic accounts. unusual pinto be
+ column-1 column-2
+ 13 ROMANIA - ular asymptotes are about the furious multipliers.
express dependencies nag above the ironically ironic account
+ 14 SAUDI ARABIA - ts. silent requests haggle. closely express
packages sleep across the blithely
+ 12 VIETNAM - hely enticingly express accounts. even, final
+ 13 RUSSIA - requests against the platelets use never according to
the quickly regular pint
+ 13 UNITED KINGDOM - eans boost carefully special requests. accounts
are. carefull
+ 11 UNITED STATES - y final packages. slow foxes cajole quickly.
quickly silent platelets breach ironic accounts. unusual pinto be
.. _`Substrait`: https://substrait.io/
.. _`Substrait Java`: https://github.com/substrait-io/substrait-java
diff --git a/docs/source/java/table.rst b/docs/source/java/table.rst
index 603910f516..5aa95e153c 100644
--- a/docs/source/java/table.rst
+++ b/docs/source/java/table.rst
@@ -75,7 +75,7 @@ Tables are created from a ``VectorSchemaRoot`` as shown
below. The memory buffer
Table t = new Table(someVectorSchemaRoot);
-If you now update the vectors held by the ``VectorSchemaRoot`` (using some
version of `ValueVector#setSafe()`), it would reflect those changes, but the
values in table *t* are unchanged.
+If you now update the vectors held by the ``VectorSchemaRoot`` (using some
version of ``ValueVector#setSafe()``), it would reflect those changes, but the
values in table *t* are unchanged.
Creating a Table from FieldVectors
**********************************
@@ -243,7 +243,7 @@ It is important to recognize that rows are NOT reified as
objects, but rather op
Getting a row
*************
-Calling `immutableRow()` on any table instance returns a new ``Row`` instance.
+Calling ``immutableRow()`` on any table instance returns a new ``Row``
instance.
.. code-block:: Java
@@ -262,7 +262,7 @@ Since rows are iterable, you can traverse a table using a
standard while loop:
// do something useful here
}
-``Table`` implements `Iterable<Row>` so you can access rows directly from a
table in an enhanced *for* loop:
+``Table`` implements ``Iterable<Row>`` so you can access rows directly from a
table in an enhanced *for* loop:
.. code-block:: Java
@@ -272,7 +272,7 @@ Since rows are iterable, you can traverse a table using a
standard while loop:
...
}
-Finally, while rows are usually iterated in the order of the underlying data
vectors, but they are also positionable using the `Row#setPosition()` method,
so you can skip to a specific row. Row numbers are 0-based.
+Finally, while rows are usually iterated in the order of the underlying data
vectors, but they are also positionable using the ``Row#setPosition()`` method,
so you can skip to a specific row. Row numbers are 0-based.
.. code-block:: Java
@@ -281,7 +281,7 @@ Finally, while rows are usually iterated in the order of
the underlying data vec
Any changes to position are applied to all the columns in the table.
-Note that you must call `next()`, or `setPosition()` before accessing values
via a row. Failure to do so results in a runtime exception.
+Note that you must call ``next()``, or ``setPosition()`` before accessing
values via a row. Failure to do so results in a runtime exception.
Read operations using rows
**************************
@@ -304,7 +304,7 @@ You can also get value using a nullable ``ValueHolder``.
For example:
This can be used to retrieve values without creating a new Object for each.
-In addition to getting values, you can check if a value is null using
`isNull()`. This is important if the vector contains any nulls, as asking for a
value from a vector can cause NullPointerExceptions in some cases.
+In addition to getting values, you can check if a value is null using
``isNull()``. This is important if the vector contains any nulls, as asking for
a value from a vector can cause NullPointerExceptions in some cases.
.. code-block:: Java
@@ -352,13 +352,13 @@ Working with the C-Data interface
The ability to work with native code is required for many Arrow features. This
section describes how tables can be be exported for use with native code
-Exporting works by converting the data to a ``VectorSchemaRoot`` instance and
using the existing facilities to transfer the data. You could do it yourself,
but that isn't ideal because conversion to a vector schema root breaks the
immutability guarantees. Using the `exportTable()` methods in the `Data`_ class
avoids this concern.
+Exporting works by converting the data to a ``VectorSchemaRoot`` instance and
using the existing facilities to transfer the data. You could do it yourself,
but that isn't ideal because conversion to a vector schema root breaks the
immutability guarantees. Using the ``exportTable()`` methods in the `Data`_
class avoids this concern.
.. code-block:: Java
Data.exportTable(bufferAllocator, table, dictionaryProvider,
outArrowArray);
-If the table contains dictionary-encoded vectors and was constructed with a
``DictionaryProvider``, the provider argument to `exportTable()` can be omitted
and the table's provider attribute will be used:
+If the table contains dictionary-encoded vectors and was constructed with a
``DictionaryProvider``, the provider argument to ``exportTable()`` can be
omitted and the table's provider attribute will be used:
.. code-block:: Java
diff --git a/docs/source/python/api/compute.rst
b/docs/source/python/api/compute.rst
index f2ac6bd1e1..5423eebfba 100644
--- a/docs/source/python/api/compute.rst
+++ b/docs/source/python/api/compute.rst
@@ -173,7 +173,7 @@ variants which detect domain errors where appropriate.
Comparisons
-----------
-These functions expect two inputs of the same type. If one of the inputs is
`null`
+These functions expect two inputs of the same type. If one of the inputs is
``null``
they return ``null``.
.. autosummary::
diff --git a/docs/source/python/data.rst b/docs/source/python/data.rst
index 9156157fcd..f17475138c 100644
--- a/docs/source/python/data.rst
+++ b/docs/source/python/data.rst
@@ -76,7 +76,7 @@ We use the name **logical type** because the **physical**
storage may be the
same for one or more types. For example, ``int64``, ``float64``, and
``timestamp[ms]`` all occupy 64 bits per value.
-These objects are `metadata`; they are used for describing the data in arrays,
+These objects are ``metadata``; they are used for describing the data in
arrays,
schemas, and record batches. In Python, they can be used in functions where the
input data (e.g. Python objects) may be coerced to more than one Arrow type.
@@ -99,7 +99,7 @@ types' children. For example, we can define a list of int32
values with:
t6 = pa.list_(t1)
t6
-A `struct` is a collection of named fields:
+A ``struct`` is a collection of named fields:
.. ipython:: python
diff --git a/docs/source/python/extending_types.rst
b/docs/source/python/extending_types.rst
index 8df0ef0b1f..83fce84f47 100644
--- a/docs/source/python/extending_types.rst
+++ b/docs/source/python/extending_types.rst
@@ -101,7 +101,7 @@ define the ``__arrow_array__`` method to return an Arrow
array::
import pyarrow
return pyarrow.array(..., type=type)
-The ``__arrow_array__`` method takes an optional `type` keyword which is passed
+The ``__arrow_array__`` method takes an optional ``type`` keyword which is
passed
through from :func:`pyarrow.array`. The method is allowed to return either
a :class:`~pyarrow.Array` or a :class:`~pyarrow.ChunkedArray`.
diff --git a/docs/source/python/filesystems.rst
b/docs/source/python/filesystems.rst
index 22f983a60c..23d10aaaad 100644
--- a/docs/source/python/filesystems.rst
+++ b/docs/source/python/filesystems.rst
@@ -182,7 +182,7 @@ Example how you can read contents from a S3 bucket::
Note that it is important to configure :class:`S3FileSystem` with the correct
-region for the bucket being used. If `region` is not set, the AWS SDK will
+region for the bucket being used. If ``region`` is not set, the AWS SDK will
choose a value, defaulting to 'us-east-1' if the SDK version is <1.8.
Otherwise it will try to use a variety of heuristics (environment variables,
configuration profile, EC2 metadata server) to resolve the region.
@@ -277,7 +277,7 @@ load time, since the library may not be in your
LD_LIBRARY_PATH), and relies on
some environment variables.
* ``HADOOP_HOME``: the root of your installed Hadoop distribution. Often has
- `lib/native/libhdfs.so`.
+ ``lib/native/libhdfs.so``.
* ``JAVA_HOME``: the location of your Java SDK installation.
diff --git a/docs/source/python/install.rst b/docs/source/python/install.rst
index 4b966e6d26..12555c9306 100644
--- a/docs/source/python/install.rst
+++ b/docs/source/python/install.rst
@@ -83,7 +83,7 @@ While Arrow uses the OS-provided timezone database on Linux
and macOS, it requir
user-provided database on Windows. To download and extract the text version of
the IANA timezone database follow the instructions in the C++
:ref:`download-timezone-database` or use pyarrow utility function
-`pyarrow.util.download_tzdata_on_windows()` that does the same.
+``pyarrow.util.download_tzdata_on_windows()`` that does the same.
By default, the timezone database will be detected at
``%USERPROFILE%\Downloads\tzdata``.
If the database has been downloaded in a different location, you will need to
set
diff --git a/docs/source/python/integration/extending.rst
b/docs/source/python/integration/extending.rst
index b380fea7e9..d4d099bcf4 100644
--- a/docs/source/python/integration/extending.rst
+++ b/docs/source/python/integration/extending.rst
@@ -474,7 +474,7 @@ Toolchain Compatibility (Linux)
The Python wheels for Linux are built using the
`PyPA manylinux images <https://quay.io/organization/pypa>`_ which use
-the CentOS `devtoolset-9`. In addition to the other notes
+the CentOS ``devtoolset-9``. In addition to the other notes
above, if you are compiling C++ using these shared libraries, you will need
to make sure you use a compatible toolchain as well or you might see a
segfault during runtime.
diff --git a/docs/source/python/memory.rst b/docs/source/python/memory.rst
index 23474b9237..7b49d48ab2 100644
--- a/docs/source/python/memory.rst
+++ b/docs/source/python/memory.rst
@@ -46,7 +46,7 @@ parent-child relationships.
There are many implementations of ``arrow::Buffer``, but they all provide a
standard interface: a data pointer and length. This is similar to Python's
-built-in `buffer protocol` and ``memoryview`` objects.
+built-in ``buffer protocol`` and ``memoryview`` objects.
A :class:`Buffer` can be created from any Python object implementing
the buffer protocol by calling the :func:`py_buffer` function. Let's consider
diff --git a/docs/source/python/timestamps.rst
b/docs/source/python/timestamps.rst
index cecbd5b595..80a1b7280c 100644
--- a/docs/source/python/timestamps.rst
+++ b/docs/source/python/timestamps.rst
@@ -24,7 +24,7 @@ Arrow/Pandas Timestamps
Arrow timestamps are stored as a 64-bit integer with column metadata to
associate a time unit (e.g. milliseconds, microseconds, or nanoseconds), and an
-optional time zone. Pandas (`Timestamp`) uses a 64-bit integer representing
+optional time zone. Pandas (``Timestamp``) uses a 64-bit integer representing
nanoseconds and an optional time zone.
Python/Pandas timestamp types without a associated time zone are referred to as
"Time Zone Naive". Python/Pandas timestamp types with an associated time zone
are