(arrow) branch main updated: GH-38950: [Docs] Fix spelling (#38951)

jorisvandenbossche Fri, 01 Dec 2023 09:34:47 -0800

This is an automated email from the ASF dual-hosted git repository.

jorisvandenbossche pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git



The following commit(s) were added to refs/heads/main by this push:
     new 3531396803 GH-38950: [Docs] Fix spelling (#38951)
3531396803 is described below

commit 353139680311e809d2413ea46e17e1656069ac5e
Author: Josh Soref <[email protected]>
AuthorDate: Fri Dec 1 12:33:09 2023 -0500

    GH-38950: [Docs] Fix spelling (#38951)
    
    
    
    ### Rationale for this change
    
    ### What changes are included in this PR?
    
    Spelling fixes to docs/
    
    ### Are these changes tested?
    
    ### Are there any user-facing changes?
    
    * Closes: #38950
    
    Lead-authored-by: Josh Soref <[email protected]>
    Co-authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Joris Van den Bossche <[email protected]>
---
 docs/source/_static/theme_overrides.css            |  6 ++--
 docs/source/conf.py                                |  2 +-
 docs/source/cpp/acero/developer_guide.rst          | 34 +++++++++++-----------
 docs/source/cpp/acero/overview.rst                 |  4 +--
 docs/source/cpp/acero/substrait.rst                |  2 +-
 docs/source/cpp/acero/user_guide.rst               |  4 +--
 docs/source/cpp/compute.rst                        |  6 ++--
 docs/source/cpp/datatypes.rst                      |  2 +-
 .../cpp/examples/compute_and_write_example.rst     |  2 +-
 .../cpp/examples/dataset_skyhook_scan_example.rst  |  4 +--
 docs/source/cpp/overview.rst                       |  2 +-
 docs/source/cpp/tutorials/basic_arrow.rst          |  2 +-
 .../developers/continuous_integration/archery.rst  |  2 +-
 .../developers/continuous_integration/crossbow.rst |  4 +--
 .../developers/continuous_integration/docker.rst   |  4 +--
 .../developers/continuous_integration/overview.rst |  4 +--
 docs/source/developers/documentation.rst           |  2 +-
 docs/source/developers/guide/documentation.rst     |  2 +-
 docs/source/developers/guide/resources.rst         |  2 +-
 .../guide/step_by_step/finding_issues.rst          |  2 +-
 .../developers/guide/tutorials/r_tutorial.rst      |  2 +-
 docs/source/developers/java/building.rst           | 14 ++++-----
 docs/source/developers/release.rst                 |  6 ++--
 docs/source/developers/reviewing.rst               |  4 +--
 docs/source/format/ADBC.rst                        |  4 +--
 docs/source/format/CDataInterface.rst              |  2 +-
 docs/source/format/CDeviceDataInterface.rst        |  8 ++---
 docs/source/format/CanonicalExtensions.rst         |  2 +-
 docs/source/format/Columnar.rst                    |  2 +-
 docs/source/java/dataset.rst                       |  4 +--
 docs/source/python/api/compute.rst                 |  2 +-
 docs/source/python/dataset.rst                     |  2 +-
 docs/source/python/getting_involved.rst            |  2 +-
 docs/source/python/integration.rst                 |  2 +-
 docs/source/python/integration/python_java.rst     |  2 +-
 docs/source/python/interchange_protocol.rst        | 16 +++++-----
 docs/source/python/memory.rst                      |  2 +-
 docs/source/python/parquet.rst                     |  2 +-
 38 files changed, 85 insertions(+), 85 deletions(-)

diff --git a/docs/source/_static/theme_overrides.css 
b/docs/source/_static/theme_overrides.css
index bf84267aea..58f4554d11 100644
--- a/docs/source/_static/theme_overrides.css
+++ b/docs/source/_static/theme_overrides.css
@@ -33,7 +33,7 @@
   }
 }
 
-/* Contibuting landing page overview cards */
+/* Contributing landing page overview cards */
 
 .contrib-card {
   border-radius: 0;
@@ -68,7 +68,7 @@
 }
 
 /* This is the bootstrap CSS style for "table-striped". Since the theme does
-not yet provide an easy way to configure this globaly, it easier to simply
+not yet provide an easy way to configure this globally, it easier to simply
 include this snippet here than updating each table in all rst files to
 add ":class: table-striped" */
 
@@ -76,7 +76,7 @@ add ":class: table-striped" */
   background-color: rgba(0, 0, 0, 0.05);
 }
 
-/* Iprove the vertical spacing in the C++ API docs
+/* Improve the vertical spacing in the C++ API docs
 (ideally this should be upstreamed to the pydata-sphinx-theme */
 
 dl.cpp dd p {
diff --git a/docs/source/conf.py b/docs/source/conf.py
index f11d78fe05..cde0c2b31f 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -139,7 +139,7 @@ autodoc_default_options = {
 breathe_projects = {"arrow_cpp": "../../cpp/apidoc/xml"}
 breathe_default_project = "arrow_cpp"
 
-# Overriden conditionally below
+# Overridden conditionally below
 autodoc_mock_imports = []
 
 # copybutton configuration
diff --git a/docs/source/cpp/acero/developer_guide.rst 
b/docs/source/cpp/acero/developer_guide.rst
index c893e41ff8..331cd833b5 100644
--- a/docs/source/cpp/acero/developer_guide.rst
+++ b/docs/source/cpp/acero/developer_guide.rst
@@ -38,7 +38,7 @@ ExecNode is an abstract class with several pure virtual 
methods that control how
 --------------------------------
 
 This method is called once at the start of the plan.  Most nodes ignore this 
method (any
-neccesary initialization should happen in the construtor or Init).  However, 
source nodes
+necessary initialization should happen in the constructor or Init).  However, 
source nodes
 will typically provide a custom implementation.  Source nodes should schedule 
whatever tasks
 are needed to start reading and providing the data.  Source nodes are usually 
the primary
 creator of tasks in a plan.
@@ -52,7 +52,7 @@ Examples
 ^^^^^^^^
 
 * In the ``table_source`` node the input table is divided into batches.  A 
task is created for
-  each batch and that task calls ``InputRecieved`` on the node's output.
+  each batch and that task calls ``InputReceived`` on the node's output.
 * In the ``scan`` node a task is created to start listing fragments from the 
dataset.  Each listing
   task then creates tasks to read batches from the fragment, asynchronously.  
When the batch is
   full read in then a continuation schedules a new task with the exec plan.  
This task calls
@@ -95,7 +95,7 @@ Examples
 
 This method will be called once per input.  A node will call InputFinished on 
its output once it
 knows how many batches it will be sending to that output.  Normally this 
happens when the node is
-finished working.  For example, a scan node will call InputFinished once it 
has finsihed reading
+finished working.  For example, a scan node will call InputFinished once it 
has finished reading
 its files.  However, it could call it earlier if it knows (maybe from file 
metadata) how many
 batches will be created.
 
@@ -173,10 +173,10 @@ There is no expectation or requirement that a node sends 
any remaining data it h
 schedules tasks (e.g. a source node) should stop producing new data.
 
 In addition to plan-wide cancellation, a node may call this method on its 
input if it has decided
-that it has recevied all the data that it needs.  However, because of 
parallelism, a node may still
+that it has received all the data that it needs.  However, because of 
parallelism, a node may still
 receive a few calls to ``InputReceived`` after it has stopped its input.
 
-If any external reosurces are used then cleanup should happen as part of this 
call.
+If any external resources are used then cleanup should happen as part of this 
call.
 
 Examples
 ^^^^^^^^
@@ -194,7 +194,7 @@ Initialization / Construction / Destruction
 Simple initialization logic (that cannot error) can be done in the 
constructor.  If the initialization
 logic may return an invalid status then it can either be done in the exec 
node's factory method or
 the ``Init`` method.  The factory method is preferred for simple validation.  
The ``Init`` method is
-preferred if the intialization might do expensive allocation or other resource 
consumption.  ``Init`` will
+preferred if the initialization might do expensive allocation or other 
resource consumption.  ``Init`` will
 always be called before ``StartProducing`` is called.  Initialization could 
also be done in
 ``StartProducing`` but keep in mind that other nodes may have started by that 
point.
 
@@ -264,7 +264,7 @@ have 20 files and 10 cores and you want to read and sort 
all the data.  You coul
 2 files to read and sort those files.  Then you could create one extra plan 
that takes the input from these
 10 child plans and merges the 10 input streams in a sorted fashion.
 
-This approach is popular because it is how queries are distributed across 
mulitple servers and so it
+This approach is popular because it is how queries are distributed across 
multiple servers and so it
 is widely supported and well understood.  Acero does not do this today but 
there is no reason to prevent it.
 Adding shuffle & partition nodes to Acero should be a high priority and would 
enable Acero to be used by
 distributed systems.  Once that has been done then it should be possible to do 
a local shuffle (local
@@ -308,7 +308,7 @@ more complex to implement.
 
 Due to a lack of standard C++ async APIs, Acero uses a combination of the two 
approaches.  Acero has two thread pools.
 The first is the CPU thread pool.  This thread pool has one thread per core.  
Tasks in this thread pool should never
-block (beyond minor delays for synchornization) and should generally be 
actively using CPU as much as possible.  Threads
+block (beyond minor delays for synchronization) and should generally be 
actively using CPU as much as possible.  Threads
 on the I/O thread pool are expected to spend most of the time idle.  They 
should avoid doing any CPU-intensive work.
 Their job is basically to wait for data to be available and schedule follow-up 
tasks on the CPU thread pool.
 
@@ -329,7 +329,7 @@ exec nodes, scan, project, and then filter (this is a very 
common use case).  No
 In a task-per-operator model we would have tasks like "Scan Batch 5", "Project 
Batch 5", and "Filter Batch 5".  Each
 of those tasks is potentially going to access the same data.  For example, 
maybe the `project` and `filter` nodes need
 to read the same column.  A column which is intially created in a decode phase 
of the `scan` node.  To maximize cache
-utiliziation we would need to carefully schedule our tasks to ensure that all 
three of those tasks are run consecutively
+utilization we would need to carefully schedule our tasks to ensure that all 
three of those tasks are run consecutively
 and assigned to the same CPU core.
 
 To avoid this problem we design tasks that run through as many nodes as 
possible before the task ends.  This sequence
@@ -378,7 +378,7 @@ yet had to address this problem.  Let's go through some 
common situations:
    locality.  However, since Acero uses a task-per-pipeline model there isn't 
much lost opportunity for cache
    parallelism that a scheduler could reclaim.  Tasks only end when there is 
no more work that can be done with the data.
 
-While there is not much prioritzation in place in Acero today we do have the 
tools to apply it should we need to.
+While there is not much prioritization in place in Acero today we do have the 
tools to apply it should we need to.
 
 .. note::
    In addition to the AsyncTaskScheduler there is another class called the 
TaskScheduler.  This class predates the
@@ -391,7 +391,7 @@ Intra-node Parallelism
 
 Some nodes can potentially exploit parallelism within a task.  For example, in 
the scan node we can decode
 columns in parallel.  In the hash join node, parallelism is sometimes 
exploited for complex tasks such as
-building the hash table.  This sort of parallelism is less common but not 
neccesarily discouraged.  Profiling should
+building the hash table.  This sort of parallelism is less common but not 
necessarily discouraged.  Profiling should
 be done first though to ensure that this extra parallelism will be helpful in 
your workload.
 
 All Work Happens in Tasks
@@ -412,7 +412,7 @@ Ordered Execution
 =================
 
 Some nodes either establish an ordering to their outgoing batches or they need 
to be able to process batches in order.
-Acero handles ordering using the `batch_index` property on an ExecBatch.  If a 
node has a determinstic output order
+Acero handles ordering using the `batch_index` property on an ExecBatch.  If a 
node has a deterministic output order
 then it should apply a batch index on batches that it emits.  For example, the 
OrderByNode applies a new ordering to
 batches (regardless of the incoming ordering).  The scan node is able to 
attach an implicit ordering to batches which
 reflects the order of the rows in the files being scanned.
@@ -458,7 +458,7 @@ Profiling & Tracing
 ===================
 
 Acero's tracing is currently half-implemented and there are major gaps in 
profiling tools.  However, there has been some
-effort at tracing with open telemetry and most of the neccesary pieces are in 
place.  The main thing currently lacking is
+effort at tracing with open telemetry and most of the necessary pieces are in 
place.  The main thing currently lacking is
 some kind of effective visualization of the tracing results.
 
 In order to use the tracing that is present today you will need to build with 
Arrow with `ARROW_WITH_OPENTELEMETRY=ON`.
@@ -521,7 +521,7 @@ any particular engine design.  For example, the hash join 
node uses utilities su
 and an exec batch builder.  Other places share implementations of sequencing 
queues and row segmenters.  The node
 itself should be kept minimal and simply maps from Acero to the abstraction.
 
-This helps to decouple designs from Acero's design details and allows them to 
be more resilant to changes in the
+This helps to decouple designs from Acero's design details and allows them to 
be more resilient to changes in the
 engine.  It also helps to promote these abstractions as capabilities on their 
own.  Either for use in other engines
 or for potential new additions to pyarrow as compute utilities.
 
@@ -642,7 +642,7 @@ OrderBySink and SelectKSink
 ---------------------------
 
 These two exec nodes provided custom sink implementations.  They were written 
before ordered execution
-was added to Acero and were the only way to generate ordered ouptut.  However, 
they had to be placed
+was added to Acero and were the only way to generate ordered output.  However, 
they had to be placed
 at the end of a plan and the fact that they were custom sink nodes made them 
difficult to describe with
 Declaration.  The OrderByNode and FetchNode replace these.  These are kept at 
the moment until existing
 bindings move away from them.
@@ -680,7 +680,7 @@ Because of this, we highly recommend taking the following 
steps:
 
 * Any PR will need to have the following:
 
-  * Unit tests convering the new functionality
+  * Unit tests converting the new functionality
 
   * Microbenchmarks if there is any significant compute work going on
 
@@ -688,5 +688,5 @@ Because of this, we highly recommend taking the following 
steps:
 
   * Updates to the API reference and this guide
 
-  * Passing CI (you can enable Github Actions on your fork and that will allow 
most CI jobs to run before
+  * Passing CI (you can enable GitHub Actions on your fork and that will allow 
most CI jobs to run before
     you create your PR)
diff --git a/docs/source/cpp/acero/overview.rst 
b/docs/source/cpp/acero/overview.rst
index 751b8d2c28..c569f82b09 100644
--- a/docs/source/cpp/acero/overview.rst
+++ b/docs/source/cpp/acero/overview.rst
@@ -58,7 +58,7 @@ A Library for Data Scientists
 Acero is not intended to be used directly by data scientists.  It is expected 
that
 end users will typically be using some kind of frontend.  For example, Pandas, 
Ibis,
 or SQL.  The API for Acero is focused around capabilities and available 
algorithms.
-However, such users may be intersted in knowing more about how Acero works so 
that
+However, such users may be interested in knowing more about how Acero works so 
that
 they can better understand how the backend processing for their libraries 
operates.
 
 A Database
@@ -149,7 +149,7 @@ strings to uppercase strings would not be a part of the 
core Arrow library becau
 require examining the contents of the array.
 
 The compute module expands on the core library and provides functions which 
analyze and
-transform data.  The compute module's capabilites are all exposed via a 
function registry.
+transform data.  The compute module's capabilities are all exposed via a 
function registry.
 An Arrow "function" accepts zero or more arrays, batches, or tables, and 
produces an array,
 batch, or table.  In addition, function calls can be combined, along with 
field references
 and literals, to form an expression (a tree of function calls) which the 
compute module can
diff --git a/docs/source/cpp/acero/substrait.rst 
b/docs/source/cpp/acero/substrait.rst
index 0d1c5bd02f..797b2407f9 100644
--- a/docs/source/cpp/acero/substrait.rst
+++ b/docs/source/cpp/acero/substrait.rst
@@ -229,7 +229,7 @@ Functions
     for the functions ``and``, ``or``, ``xor``
 
 * Substrait has not yet clearly identified the form that URIs should take for
-  standard functions.  Acero will look for the URIs to the ``main`` Github 
branch.
+  standard functions.  Acero will look for the URIs to the ``main`` GitHub 
branch.
   In other words, for the file ``functions_arithmetic.yaml`` Acero expects
   
``https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml``
 
diff --git a/docs/source/cpp/acero/user_guide.rst 
b/docs/source/cpp/acero/user_guide.rst
index 333149caa7..eca1a01047 100644
--- a/docs/source/cpp/acero/user_guide.rst
+++ b/docs/source/cpp/acero/user_guide.rst
@@ -171,7 +171,7 @@ can support all of these cases and can even support unique 
and custom situations
 
 There are pre-defined source nodes that cover the most common input scenarios. 
 These are listed below.  However,
 if your source data is unique then you will need to use the generic ``source`` 
node.  This node expects you to
-provide an asycnhronous stream of batches and is covered in more detail 
:ref:`here <stream_execution_source_docs>`.
+provide an asynchronous stream of batches and is covered in more detail 
:ref:`here <stream_execution_source_docs>`.
 
 .. _ExecNode List:
 
@@ -710,7 +710,7 @@ defining a join. The hash_join supports
 <https://en.wikipedia.org/wiki/Join_(SQL)>`_. 
 Also the join-key (i.e. the column(s) to join on), and suffixes (i.e a suffix 
term like "_x"
 which can be appended as a suffix for column names duplicated in both left and 
right 
-relations.) can be set via the the join options. 
+relations.) can be set via the join options. 
 `Read more on hash-joins
 <https://en.wikipedia.org/wiki/Hash_join>`_. 
 
diff --git a/docs/source/cpp/compute.rst b/docs/source/cpp/compute.rst
index 44f43cbc87..47af976415 100644
--- a/docs/source/cpp/compute.rst
+++ b/docs/source/cpp/compute.rst
@@ -155,7 +155,7 @@ is signed. For example:
 | float32, int64    | float32              | int64 is wider, still promotes to 
float32      |
 
+-------------------+----------------------+------------------------------------------------+
 
-In particulary, note that comparing a ``uint64`` column to an ``int16`` column
+In particular, note that comparing a ``uint64`` column to an ``int16`` column
 may emit an error if one of the ``uint64`` values cannot be expressed as the
 common type ``int64`` (for example, ``2 ** 63``).
 
@@ -1622,10 +1622,10 @@ Cumulative Functions
 ~~~~~~~~~~~~~~~~~~~~
 
 Cumulative functions are vector functions that perform a running accumulation 
on 
-their input using a given binary associative operation with an identidy 
element 
+their input using a given binary associative operation with an identity 
element 
 (a monoid) and output an array containing the corresponding intermediate 
running 
 values. The input is expected to be of numeric type. By default these 
functions 
-do not detect overflow. They are alsoavailable in an overflow-checking 
variant, 
+do not detect overflow. They are also available in an overflow-checking 
variant, 
 suffixed ``_checked``, which returns an ``Invalid`` :class:`Status` when 
 overflow is detected.
 
diff --git a/docs/source/cpp/datatypes.rst b/docs/source/cpp/datatypes.rst
index 922fef1498..4e1fe76b4d 100644
--- a/docs/source/cpp/datatypes.rst
+++ b/docs/source/cpp/datatypes.rst
@@ -186,7 +186,7 @@ here is how one might sum across columns of arbitrary 
numeric types:
    
      // Default implementation
      arrow::Status Visit(const arrow::Array& array) {
-       return arrow::Status::NotImplemented("Can not compute sum for array of 
type ",
+       return arrow::Status::NotImplemented("Cannot compute sum for array of 
type ",
                                             array.type()->ToString());
      }
    
diff --git a/docs/source/cpp/examples/compute_and_write_example.rst 
b/docs/source/cpp/examples/compute_and_write_example.rst
index c4480a5f5c..e66d3ced55 100644
--- a/docs/source/cpp/examples/compute_and_write_example.rst
+++ b/docs/source/cpp/examples/compute_and_write_example.rst
@@ -23,6 +23,6 @@ Compute and Write CSV Example
 
 The file ``cpp/examples/arrow/compute_and_write_csv_example.cc`` located 
inside 
 the source tree contains an example of creating a table of two numerical 
columns 
-and then comparing the magnitudes of the entries in the columns and wrting out 
to 
+and then comparing the magnitudes of the entries in the columns and writing 
out to 
 a CSV file with the column entries and their comparisons.  The code in the 
example
 is documented.
diff --git a/docs/source/cpp/examples/dataset_skyhook_scan_example.rst 
b/docs/source/cpp/examples/dataset_skyhook_scan_example.rst
index 75a3954cf3..4f7d558dcf 100644
--- a/docs/source/cpp/examples/dataset_skyhook_scan_example.rst
+++ b/docs/source/cpp/examples/dataset_skyhook_scan_example.rst
@@ -26,8 +26,8 @@ The file 
``cpp/examples/arrow/dataset_skyhook_scan_example.cc``
 located inside the source tree contains an example of using Skyhook to
 offload filters and projections to a Ceph cluster.
 
-Instuctions
-===========
+Instructions
+============
 
 .. note::
    The instructions below are for Ubuntu 20.04 or above.
diff --git a/docs/source/cpp/overview.rst b/docs/source/cpp/overview.rst
index 33f075bd18..d67e0a7dec 100644
--- a/docs/source/cpp/overview.rst
+++ b/docs/source/cpp/overview.rst
@@ -36,7 +36,7 @@ The one-dimensional layer
 -------------------------
 
 **Data types** govern the *logical* interpretation of *physical* data.
-Many operations in Arrow are parametered, at compile-time or at runtime,
+Many operations in Arrow are parameterized, at compile-time or at runtime,
 by a data type.
 
 **Arrays** assemble one or several buffers with a data type, allowing to
diff --git a/docs/source/cpp/tutorials/basic_arrow.rst 
b/docs/source/cpp/tutorials/basic_arrow.rst
index 06f5fde32e..409dfcc40d 100644
--- a/docs/source/cpp/tutorials/basic_arrow.rst
+++ b/docs/source/cpp/tutorials/basic_arrow.rst
@@ -241,7 +241,7 @@ Making a Table
 One particularly useful thing we can do with the :class:`ChunkedArrays 
<ChunkedArray>` from the previous section is creating 
 :class:`Tables <Table>`. Much like a :class:`RecordBatch`, a :class:`Table` 
stores tabular data. However, a 
 :class:`Table` does not guarantee contiguity, due to being made up of 
:class:`ChunkedArrays <ChunkedArray>`.
-This can be useful for logic, paralellizing work, for fitting chunks into 
cache, or exceeding the 2,147,483,647 row limit
+This can be useful for logic, parallelizing work, for fitting chunks into 
cache, or exceeding the 2,147,483,647 row limit
 present in :class:`Array` and, thus, :class:`RecordBatch`.
 
 If you read up to :class:`RecordBatch`, you may note that the :class:`Table` 
constructor in the following code is  
diff --git a/docs/source/developers/continuous_integration/archery.rst 
b/docs/source/developers/continuous_integration/archery.rst
index 4b9f1f300e..d190a0a96c 100644
--- a/docs/source/developers/continuous_integration/archery.rst
+++ b/docs/source/developers/continuous_integration/archery.rst
@@ -68,7 +68,7 @@ You can inspect Archery usage by passing the ``--help`` flag:
      linking      Quick and dirty utilities for checking library linkage.
      lint         Check Arrow source tree for errors
      numpydoc     Lint python docstring with NumpyDoc
-     release      Release releated commands.
+     release      Release related commands.
      trigger-bot
 
 Archery exposes independent subcommands, each of which provides dedicated
diff --git a/docs/source/developers/continuous_integration/crossbow.rst 
b/docs/source/developers/continuous_integration/crossbow.rst
index 6308f077ac..50ac607f4d 100644
--- a/docs/source/developers/continuous_integration/crossbow.rst
+++ b/docs/source/developers/continuous_integration/crossbow.rst
@@ -75,7 +75,7 @@ The following guide depends on GitHub, but theoretically any 
git
 server can be used.
 
 If you are not using the `ursacomputing/crossbow`_
-repository, you will need to complete the first two steps, otherwise procede
+repository, you will need to complete the first two steps, otherwise proceed
 to step 3:
 
 1. `Create the queue repository`_
@@ -245,7 +245,7 @@ see its help page:
 .. _Wheels: python-wheels
 .. _Linux packages: linux-packages
 .. _Create the queue repository: 
https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-new-repository
-.. _Github Actions: https://docs.github.com/en/actions/quickstart
+.. _GitHub Actions: https://docs.github.com/en/actions/quickstart
 .. _Travis CI: https://travis-ci.com/getting-started/
 .. _Azure Pipelines: 
https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-sign-up
 .. _auto cancellation: 
https://docs.travis-ci.com/user/customizing-the-build/#building-only-the-latest-commit
diff --git a/docs/source/developers/continuous_integration/docker.rst 
b/docs/source/developers/continuous_integration/docker.rst
index 49cbffe5a4..68f3c7d709 100644
--- a/docs/source/developers/continuous_integration/docker.rst
+++ b/docs/source/developers/continuous_integration/docker.rst
@@ -199,8 +199,8 @@ For detailed examples see the docker-compose.yml.
 Build Scripts
 ~~~~~~~~~~~~~
 
-The scripts maintainted under ci/scripts directory should be kept
-parametrizable but reasonably minimal to clearly encapsulate the tasks it is
+The scripts maintained under ci/scripts directory should be kept
+parameterizable but reasonably minimal to clearly encapsulate the tasks it is
 responsible for. Like:
 
 - ``cpp_build.sh``: build the C++ implementation without running the tests.
diff --git a/docs/source/developers/continuous_integration/overview.rst 
b/docs/source/developers/continuous_integration/overview.rst
index 3e155bf600..93e74f269d 100644
--- a/docs/source/developers/continuous_integration/overview.rst
+++ b/docs/source/developers/continuous_integration/overview.rst
@@ -34,7 +34,7 @@ One thing to note is that some of the services defined in 
``docker-compose.yml``
 
 There are numerous important directories in the Arrow project which relate to 
CI:
 
-- ``.github/worflows`` - workflows that are run via GitHub actions and are 
triggered by things like pull requests being submitted or merged
+- ``.github/workflows`` - workflows that are run via GitHub actions and are 
triggered by things like pull requests being submitted or merged
 - ``dev/tasks`` - containing extended jobs triggered/submitted via ``archery 
crossbow submit ...``, typically nightly builds or relating to the release 
process
 - ``ci/`` - containing scripts, dockerfiles, and any supplemental files, e.g. 
patch files, conda environment files, vcpkg triplet files.
 
@@ -46,7 +46,7 @@ Instead of thinking about Arrow CI in terms of files and 
folders, it may be conc
 Action-triggered builds
 -----------------------
 
-The ``.yml`` files in ``.github/worflows`` are workflows which are run on 
GitHub in response to specific actions.  The majority of workflows in this 
directory are Arrow implementation-specific and are run when changes are made 
which affect code relevant to that language's implementation, but other 
workflows worth noting are:
+The ``.yml`` files in ``.github/workflows`` are workflows which are run on 
GitHub in response to specific actions.  The majority of workflows in this 
directory are Arrow implementation-specific and are run when changes are made 
which affect code relevant to that language's implementation, but other 
workflows worth noting are:
 
 - ``archery.yml`` - if changes are made to the Archery tool or tasks which it 
runs, this workflow runs the necessary validation checks
 - ``comment_bot.yml`` - triggers certain actions by listening on github pull 
request comments for the following strings:
diff --git a/docs/source/developers/documentation.rst 
b/docs/source/developers/documentation.rst
index fcd8e84c7a..8b1ea28c0f 100644
--- a/docs/source/developers/documentation.rst
+++ b/docs/source/developers/documentation.rst
@@ -136,7 +136,7 @@ GitHub Actions response, where you need to click on the 
Crossbow build badge:
 
 .. figure:: ./images/docs_preview_1.jpeg
    :scale: 70 %
-   :alt: Github-actions response with the crossbow build status.
+   :alt: GitHub Actions response with the crossbow build status.
 
    Crossbow build status
 
diff --git a/docs/source/developers/guide/documentation.rst 
b/docs/source/developers/guide/documentation.rst
index 22e8e0eae4..3bb3bebef5 100644
--- a/docs/source/developers/guide/documentation.rst
+++ b/docs/source/developers/guide/documentation.rst
@@ -84,7 +84,7 @@ library. Source folder includes:
 
 - **C++ documentation** section: ``docs/source/cpp``.
 - **Development** section: ``docs/source/developers``.
-- **Specificatons and protocols** section: ``docs/source/format``.
+- **Specifications and protocols** section: ``docs/source/format``.
 - **Language documentation**
 
   **C (GLib), Java, JavaScript** and **Python** documentation is located
diff --git a/docs/source/developers/guide/resources.rst 
b/docs/source/developers/guide/resources.rst
index f6e8db61e5..f350f469af 100644
--- a/docs/source/developers/guide/resources.rst
+++ b/docs/source/developers/guide/resources.rst
@@ -62,7 +62,7 @@ Additional information
 
 Other resources
 ---------------
-Github
+GitHub
 
 - `GitHub docs: Fork a repo 
<https://docs.github.com/en/get-started/quickstart/fork-a-repo>`_
 - `GitHub: Creating a pull request from a fork 
<https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork>`_
diff --git a/docs/source/developers/guide/step_by_step/finding_issues.rst 
b/docs/source/developers/guide/step_by_step/finding_issues.rst
index a3af1640a3..390c56a81c 100644
--- a/docs/source/developers/guide/step_by_step/finding_issues.rst
+++ b/docs/source/developers/guide/step_by_step/finding_issues.rst
@@ -74,7 +74,7 @@ in the comments.
 Also, do not hesitate to ask questions in the comment. You can get some
 pointers about where to start and similar issues already solved.
 
-**What if an issue is already asigned?**
+**What if an issue is already assigned?**
 When in doubt, comment on the issue asking if they mind if you try to put
 together a pull request; interpret no response to mean that you’re free to
 proceed.
diff --git a/docs/source/developers/guide/tutorials/r_tutorial.rst 
b/docs/source/developers/guide/tutorials/r_tutorial.rst
index 5908064a7d..62d5cfcbc7 100644
--- a/docs/source/developers/guide/tutorials/r_tutorial.rst
+++ b/docs/source/developers/guide/tutorials/r_tutorial.rst
@@ -86,7 +86,7 @@ link of the main repository to our upstream.
 Building R package
 ------------------
 
-The steps to follow for for building the R package differs depending on the 
operating
+The steps to follow for building the R package differs depending on the 
operating
 system you are using. For this reason we will only refer to
 the instructions for the building process in this tutorial.
 
diff --git a/docs/source/developers/java/building.rst 
b/docs/source/developers/java/building.rst
index f9f44d5e97..0e831915e0 100644
--- a/docs/source/developers/java/building.rst
+++ b/docs/source/developers/java/building.rst
@@ -107,7 +107,7 @@ We can build these manually or we can use `Archery`_ to 
build them using a Docke
 Maven
 ~~~~~
 
-- To build only the JNI C Data Interface library (MacOS / Linux):
+- To build only the JNI C Data Interface library (macOS / Linux):
 
   .. code-block:: text
 
@@ -128,7 +128,7 @@ Maven
       $ dir "../java-dist/bin/x86_64"
       |__ arrow_cdata_jni.dll
 
-- To build all JNI libraries (MacOS / Linux) except the JNI C Data Interface 
library:
+- To build all JNI libraries (macOS / Linux) except the JNI C Data Interface 
library:
 
   .. code-block:: text
 
@@ -153,7 +153,7 @@ Maven
 CMake
 ~~~~~
 
-- To build only the JNI C Data Interface library (MacOS / Linux):
+- To build only the JNI C Data Interface library (macOS / Linux):
 
   .. code-block:: text
 
@@ -192,7 +192,7 @@ CMake
       $ dir "java-dist/bin"
       |__ arrow_cdata_jni.dll
 
-- To build all JNI libraries (MacOS / Linux) except the JNI C Data Interface 
library:
+- To build all JNI libraries (macOS / Linux) except the JNI C Data Interface 
library:
 
   .. code-block::
 
@@ -393,7 +393,7 @@ Installing Nightly Packages
     These packages are not official releases. Use them at your own risk.
 
 Arrow nightly builds are posted on the mailing list at 
`[email protected]`_.
-The artifacts are uploaded to GitHub. For example, for 2022/07/30, they can be 
found at `Github Nightly`_.
+The artifacts are uploaded to GitHub. For example, for 2022/07/30, they can be 
found at `GitHub Nightly`_.
 
 
 Installing from Apache Nightlies
@@ -429,7 +429,7 @@ Installing Manually
 -------------------
 
 1. Decide nightly packages repository to use, for example: 
https://github.com/ursacomputing/crossbow/releases/tag/nightly-packaging-2022-07-30-0-github-java-jars
-2. Add packages to your pom.xml, for example: flight-core (it depends on: 
arrow-format, arrow-vector, arrow-memeory-core and arrow-memory-netty).
+2. Add packages to your pom.xml, for example: flight-core (it depends on: 
arrow-format, arrow-vector, arrow-memory-core and arrow-memory-netty).
 
    .. code-block:: xml
 
@@ -540,4 +540,4 @@ Installing Manually
 6. Compile your project like usual with ``mvn clean install``.
 
 .. [email protected]: 
https://lists.apache.org/[email protected]
-.. _Github Nightly: 
https://github.com/ursacomputing/crossbow/releases/tag/nightly-packaging-2022-07-30-0-github-java-jars
+.. _GitHub Nightly: 
https://github.com/ursacomputing/crossbow/releases/tag/nightly-packaging-2022-07-30-0-github-java-jars
diff --git a/docs/source/developers/release.rst 
b/docs/source/developers/release.rst
index 5b7726f58d..0ff8e3a824 100644
--- a/docs/source/developers/release.rst
+++ b/docs/source/developers/release.rst
@@ -183,7 +183,7 @@ Build source and binaries and submit them
     
     # Sign and upload the Java artifacts
     #
-    # Note that you need to press the "Close" button manually by Web interfacec
+    # Note that you need to press the "Close" button manually by Web interface
     # after you complete the script:
     #   https://repository.apache.org/#stagingRepositories
     dev/release/06-java-upload.sh <version> <rc-number>
@@ -383,7 +383,7 @@ Be sure to go through on the following checklist:
       cd -
 
       # dev/release/post-12-msys2.sh 10.0.0 ../MINGW-packages
-      dev/release/post-12-msys2.sh X.Y.Z <YOUR_MINGW_PACAKGES_FORK>
+      dev/release/post-12-msys2.sh X.Y.Z <YOUR_MINGW_PACKAGES_FORK>
 
    This script pushes a ``arrow-X.Y.Z`` branch to your 
``msys2/MINGW-packages`` fork. You need to create a pull request from the 
``arrow-X.Y.Z`` branch with ``arrow: Update to X.Y.Z`` title on your Web 
browser.
 
@@ -419,7 +419,7 @@ Be sure to go through on the following checklist:
 
    The package upload requires npm and yarn to be installed and 2FA to be 
configured on your account.
 
-   When you have access, you can publish releases to npm by running the the 
following script:
+   When you have access, you can publish releases to npm by running the 
following script:
 
    .. code-block:: Bash
 
diff --git a/docs/source/developers/reviewing.rst 
b/docs/source/developers/reviewing.rst
index 9a2e3dd7cc..b6e0c1f402 100644
--- a/docs/source/developers/reviewing.rst
+++ b/docs/source/developers/reviewing.rst
@@ -217,11 +217,11 @@ Social aspects
 
 * If you know someone who has the competence to help on a blocking issue
   and past experience suggests they may be willing to do so, feel free to
-  add them to the discussion (for example by gently pinging their Github
+  add them to the discussion (for example by gently pinging their GitHub
   handle).
 
 * If the contributor has stopped giving feedback or updating their PR,
-  perhaps they're not interested any more, but perhaps also they're stuck
+  perhaps they're not interested anymore, but perhaps also they're stuck
   on some issue and feel unable to push their contribution any further.
   Don't hesitate to ask (*"I see this PR hasn't seen any updates recently,
   are you stuck on something? Do you need any help?"*).
diff --git a/docs/source/format/ADBC.rst b/docs/source/format/ADBC.rst
index 0bd835e97d..f90ab24d1b 100644
--- a/docs/source/format/ADBC.rst
+++ b/docs/source/format/ADBC.rst
@@ -199,8 +199,8 @@ bypass this wrapper.
    implement the same protocol to try to reuse each other's work,
    e.g. several databases implement the Postgres wire protocol to
    benefit from its driver implementations.  But the protocol itself
-   was not designed with multiple databases in mind, nor are they
-   generally meant to be used directly by applications.
+   was not designed with multiple databases in mind, nor are the
+   protocols generally meant to be used directly by applications.
 
    Some database-specific protocols are Arrow-native, like those of
    BigQuery and ClickHouse.  Flight SQL additionally is meant to be
diff --git a/docs/source/format/CDataInterface.rst 
b/docs/source/format/CDataInterface.rst
index c9beddabed..812212f536 100644
--- a/docs/source/format/CDataInterface.rst
+++ b/docs/source/format/CDataInterface.rst
@@ -39,7 +39,7 @@ corresponding C FFI declarations.
 Applications and libraries can therefore work with Arrow memory without
 necessarily using Arrow libraries or reinventing the wheel. Developers can
 choose between tight integration
-with the Arrow *software project* (benefitting from the growing array of
+with the Arrow *software project* (benefiting from the growing array of
 facilities exposed by e.g. the C++ or Java implementations of Apache Arrow,
 but with the cost of a dependency) or minimal integration with the Arrow
 *format* only.
diff --git a/docs/source/format/CDeviceDataInterface.rst 
b/docs/source/format/CDeviceDataInterface.rst
index a584852df8..76b7132681 100644
--- a/docs/source/format/CDeviceDataInterface.rst
+++ b/docs/source/format/CDeviceDataInterface.rst
@@ -61,7 +61,7 @@ Goals
 * Make it easy for third-party projects to implement support with little
   initial investment.
 * Allow zero-copy sharing of Arrow formatted device memory between
-  independant runtimes and components running in the same process.
+  independent runtimes and components running in the same process.
 * Avoid the need for one-to-one adaptation layers such as the
   `CUDA Array Interface`_ for Python processes to pass CUDA data.
 * Enable integration without explicit dependencies (either at compile-time
@@ -445,7 +445,7 @@ could be used for any device:
         array->release = NULL;
     }
 
-    void export_int32_device_array(void* cudaAllocdPtr,
+    void export_int32_device_array(void* cudaAllocedPtr,
                                    cudaStream_t stream,
                                    int64_t length,
                                    struct ArrowDeviceArray* array) {
@@ -492,7 +492,7 @@ could be used for any device:
         array->array.buffers = (const void**)malloc(sizeof(void*) * 
array->array.n_buffers);
         assert(array->array.buffers != NULL);
         array->array.buffers[0] = NULL;
-        array->array.buffers[1] = cudaAllocdPtr;
+        array->array.buffers[1] = cudaAllocedPtr;
     }
 
     // calling the release callback should be done using the array member
@@ -629,7 +629,7 @@ Result lifetimes
 ''''''''''''''''
 
 The data returned by the ``get_schema`` and ``get_next`` callbacks must be
-released independantly. Their lifetimes are not tied to that of
+released independently. Their lifetimes are not tied to that of
 ``ArrowDeviceArrayStream``.
 
 Stream lifetime
diff --git a/docs/source/format/CanonicalExtensions.rst 
b/docs/source/format/CanonicalExtensions.rst
index 084b6e6289..86cfab718d 100644
--- a/docs/source/format/CanonicalExtensions.rst
+++ b/docs/source/format/CanonicalExtensions.rst
@@ -130,7 +130,7 @@ Fixed shape tensor
 
     ``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}``
 
-    This is the physical layout shape and the the shape of the logical
+    This is the physical layout shape and the shape of the logical
     layout would in this case be ``[500, 100, 200]``.
 
 .. note::
diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst
index 9bdee37d18..a6632fa2cf 100644
--- a/docs/source/format/Columnar.rst
+++ b/docs/source/format/Columnar.rst
@@ -988,7 +988,7 @@ access is less efficient.)
    to the length of the array and this would be confusing.
 
 
-A run must have have a length of at least 1. This means the values in the
+A run must have a length of at least 1. This means the values in the
 run ends array all are positive and in strictly ascending order. A run end 
cannot be
 null.
 
diff --git a/docs/source/java/dataset.rst b/docs/source/java/dataset.rst
index a4381e0814..ec816052e7 100644
--- a/docs/source/java/dataset.rst
+++ b/docs/source/java/dataset.rst
@@ -149,7 +149,7 @@ ScanOptions:
 
     ScanOptions options = new ScanOptions(32768, Optional.empty());
 
-Or use shortcut construtor:
+Or use shortcut constructor:
 
 .. code-block:: Java
 
@@ -199,7 +199,7 @@ Native Memory Management
 ========================
 
 To gain better performance and reduce code complexity, Java
-``FileSystemDataset`` internally relys on C++
+``FileSystemDataset`` internally relies on C++
 ``arrow::dataset::FileSystemDataset`` via JNI.
 As a result, all Arrow data read from ``FileSystemDataset`` is supposed to be
 allocated off the JVM heap. To manage this part of memory, an utility class
diff --git a/docs/source/python/api/compute.rst 
b/docs/source/python/api/compute.rst
index f29d4db394..4ee364fcf6 100644
--- a/docs/source/python/api/compute.rst
+++ b/docs/source/python/api/compute.rst
@@ -53,7 +53,7 @@ Cumulative Functions
 --------------------
 
 Cumulative functions are vector functions that perform a running accumulation 
on 
-their input using a given binary associative operation with an identidy 
element 
+their input using a given binary associative operation with an identity 
element 
 (a monoid) and output an array containing the corresponding intermediate 
running 
 values. The input is expected to be of numeric type. By default these 
functions 
 do not detect overflow. They are also
diff --git a/docs/source/python/dataset.rst b/docs/source/python/dataset.rst
index 417a8a049c..daab36f9a7 100644
--- a/docs/source/python/dataset.rst
+++ b/docs/source/python/dataset.rst
@@ -708,7 +708,7 @@ into memory:
 
 After the above example runs our data will be in dataset_root/1 and 
dataset_root/2
 directories.  In this simple example we are not changing the structure of the 
data
-(only the directory naming schema) but you could also use this mechnaism to 
change
+(only the directory naming schema) but you could also use this mechanism to 
change
 which columns are used to partition the dataset.  This is useful when you 
expect to
 query your data in specific ways and you can utilize partitioning to reduce the
 amount of data you need to read.
diff --git a/docs/source/python/getting_involved.rst 
b/docs/source/python/getting_involved.rst
index 2271ad3cc0..7b3bcf2ac5 100644
--- a/docs/source/python/getting_involved.rst
+++ b/docs/source/python/getting_involved.rst
@@ -56,7 +56,7 @@ used as foundations to build easier to use entities.
   as is without modification.
 * The ``lib.pyx`` file is where the majority of the core C++ libarrow 
   capabilities are exposed to Python. Most of the implementation of this
-  module relies on included ``*.pxi`` files where the specificic pieces
+  module relies on included ``*.pxi`` files where the specific pieces
   are built. While being exposed to Python as ``pyarrow.lib`` its content
   should be considered internal. The public classes are then directly exposed
   in other modules (like ``pyarrow`` itself) by virtue of importing them from
diff --git a/docs/source/python/integration.rst 
b/docs/source/python/integration.rst
index 997bc52102..1cafc3dbde 100644
--- a/docs/source/python/integration.rst
+++ b/docs/source/python/integration.rst
@@ -27,7 +27,7 @@ Developers can use Arrow to exchange data between various
 technologies and languages without incurring in any extra cost of
 marshalling/unmarshalling the data. The Arrow bindings and Arrow
 native libraries on the various platforms will all understand Arrow data
-natively wihout the need to decode it.
+natively without the need to decode it.
 
 This allows to easily integrate PyArrow with other languages and technologies.
 
diff --git a/docs/source/python/integration/python_java.rst 
b/docs/source/python/integration/python_java.rst
index 8b086485cf..0a242a4d39 100644
--- a/docs/source/python/integration/python_java.rst
+++ b/docs/source/python/integration/python_java.rst
@@ -246,7 +246,7 @@ We can use ``maven`` to collect all dependencies and make 
them available in a si
     Instead of manually collecting dependencies, you could also rely on the
     ``maven-assembly-plugin`` to build a single ``jar`` with all dependencies.
 
-Once our package and all its depdendencies are available,
+Once our package and all its dependencies are available,
 we can invoke it from ``fillten_pyarrowjvm.py`` script that will
 import the ``FillTen`` class and print out the result of invoking 
``FillTen.createArray``
 
diff --git a/docs/source/python/interchange_protocol.rst 
b/docs/source/python/interchange_protocol.rst
index e293699220..c354541a67 100644
--- a/docs/source/python/interchange_protocol.rst
+++ b/docs/source/python/interchange_protocol.rst
@@ -46,7 +46,7 @@ the consumer library can take and construct an object of it's 
own.
 .. code-block::
 
     >>> import pyarrow as pa
-    >>> table = pa.table({"n_atendees": [100, 10, 1]})
+    >>> table = pa.table({"n_attendees": [100, 10, 1]})
     >>> table.__dataframe__()
     <pyarrow.interchange.dataframe._PyArrowDataFrame object at ...>
 
@@ -72,20 +72,20 @@ pyarrow table with the use of the interchange protocol:
 
     >>> import pandas as pd
     >>> df = pd.DataFrame({
-    ...         "n_atendees": [100, 10, 1],
+    ...         "n_attendees": [100, 10, 1],
     ...         "country": ["Italy", "Spain", "Slovenia"],
     ...     })
     >>> df
-       n_atendees   country
-    0         100     Italy
-    1          10     Spain
-    2           1  Slovenia
+       n_attendees   country
+    0          100     Italy
+    1           10     Spain
+    2            1  Slovenia
     >>> from_dataframe(df)
     pyarrow.Table
-    n_atendees: int64
+    n_attendees: int64
     country: large_string
     ----
-    n_atendees: [[100,10,1]]
+    n_attendees: [[100,10,1]]
     country: [["Italy","Spain","Slovenia"]]
 
 We can do the same with a polars dataframe:
diff --git a/docs/source/python/memory.rst b/docs/source/python/memory.rst
index 76b06757c8..23474b9237 100644
--- a/docs/source/python/memory.rst
+++ b/docs/source/python/memory.rst
@@ -102,7 +102,7 @@ Let's allocate a resizable :class:`Buffer` from the default 
pool:
    pa.total_allocated_bytes()
 
 The default allocator requests memory in a minimum increment of 64 bytes. If
-the buffer is garbaged-collected, all of the memory is freed:
+the buffer is garbage-collected, all of the memory is freed:
 
 .. ipython:: python
 
diff --git a/docs/source/python/parquet.rst b/docs/source/python/parquet.rst
index 24e6aa4fc0..85a9674a68 100644
--- a/docs/source/python/parquet.rst
+++ b/docs/source/python/parquet.rst
@@ -428,7 +428,7 @@ metadata-only Parquet files. Note this is not a Parquet 
standard, but a
 convention set in practice by those frameworks.
 
 Using those files can give a more efficient creation of a parquet Dataset,
-since it can use the stored schema and and file paths of all row groups,
+since it can use the stored schema and file paths of all row groups,
 instead of inferring the schema and crawling the directories for all Parquet
 files (this is especially the case for filesystems where accessing files
 is expensive).

(arrow) branch main updated: GH-38950: [Docs] Fix spelling (#38951)

Reply via email to