This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git


The following commit(s) were added to refs/heads/main by this push:
     new 8a8dccb  ARROW-13727: Recipe to concatenate two tables (#76)
8a8dccb is described below

commit 8a8dccb5661428916b0796b0c3dc92d528803888
Author: Alessandro Molina <[email protected]>
AuthorDate: Thu Sep 30 16:19:08 2021 +0200

    ARROW-13727: Recipe to concatenate two tables (#76)
    
    * Recipe to concatenate two tables
    
    * Apply suggestions from code review
    
    Co-authored-by: Nic <[email protected]>
    
    Co-authored-by: Nic <[email protected]>
---
 python/source/data.rst | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/python/source/data.rst b/python/source/data.rst
index 527181d..bdcf648 100644
--- a/python/source/data.rst
+++ b/python/source/data.rst
@@ -137,3 +137,47 @@ function
 .. testoutput::
 
   0 .. 198
+
+Appending tables to an existing table
+=====================================
+
+If you have data split across two different tables, it is possible
+to concatenate their rows into a single table.
+
+If we have the list of Oscar nominations divided between two different tables:
+
+.. testcode::
+
+  import pyarrow as pa
+
+  oscar_nominations_1 = pa.table([
+    ["Meryl Streep", "Katharine Hepburn"],
+    [21, 12]
+  ], names=["actor", "nominations"])
+
+  oscar_nominations_2 = pa.table([
+    ["Jack Nicholson", "Bette Davis"],
+    [12, 10]
+  ], names=["actor", "nominations"])
+
+We can combine them into a single table using :func:`pyarrow.concat_tables`:
+
+.. testcode::
+
+  oscar_nominations = pa.concat_tables([oscar_nominations_1, 
+                                        oscar_nominations_2])
+
+  print(oscar_nominations.to_pydict())
+
+.. testoutput::
+
+  {'actor': ['Meryl Streep', 'Katharine Hepburn', 'Jack Nicholson', 'Bette 
Davis'], 'nominations': [21, 12, 12, 10]}
+
+.. note::
+
+  By default, appending two tables is a zero-copy operation that doesn't need 
to
+  copy or rewrite data. As tables are made of :class:`pyarrow.ChunkedArray`,
+  the result will be a table with multiple chunks, each pointing to the 
original 
+  data that has been appended. Under some conditions, Arrow might have to 
+  cast data from one type to another (if `promote=True`).  In such cases the 
data 
+  will need to be copied and an extra cost will occur.
\ No newline at end of file

Reply via email to