[arrow-cookbook] branch main updated: Update python tests for version 6.0.0 (#98)

thisisnic Thu, 28 Oct 2021 05:19:40 -0700

This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git



The following commit(s) were added to refs/heads/main by this push:
     new 42507ed  Update python tests for version 6.0.0 (#98)
42507ed is described below

commit 42507edcbda72dd11beb5c085847a58d289fad64
Author: Alessandro Molina <[email protected]>
AuthorDate: Thu Oct 28 14:18:59 2021 +0200

    Update python tests for version 6.0.0 (#98)
    
    * Update python tests for version 6.0.0
    
    * Adapt to new partitioning convention
---
 python/source/create.rst | 10 +++++++++
 python/source/data.rst   | 33 ++++++++++++++++++----------
 python/source/io.rst     | 57 +++++++++++++++++++++++-------------------------
 python/source/schema.rst |  8 +++++++
 4 files changed, 67 insertions(+), 41 deletions(-)

diff --git a/python/source/create.rst b/python/source/create.rst
index 28773d9..b9ca220 100644
--- a/python/source/create.rst
+++ b/python/source/create.rst
@@ -94,6 +94,10 @@ by pairing multiple arrays with names for their columns
     col1: int64
     col2: string
     col3: double
+    ----
+    col1: [[1,2,3,4,5]]
+    col2: [["a","b","c","d","e"]]
+    col3: [[1,2,3,4,5]]
 
 Create Table from Plain Types
 =============================
@@ -122,6 +126,9 @@ from a variety of inputs, including plain python objects
     pyarrow.Table
     col1: int64
     col2: string
+    ----
+    col1: [[1,2,3,4,5]]
+    col2: [["a","b","c","d","e"]]
 
 .. note::
 
@@ -167,6 +174,9 @@ Multiple batches can be combined into a table using
     pyarrow.Table
     odd: int64
     even: int64
+    ----
+    odd: [[1,3,5,7,9],[11,13,15,17,19]]
+    even: [[2,4,6,8,10],[12,14,16,18,20]]
 
 Equally, :class:`pyarrow.Table` can be converted to a list of 
 :class:`pyarrow.RecordBatch` using the :meth:`pyarrow.Table.to_batches`
diff --git a/python/source/data.rst b/python/source/data.rst
index f00eed1..970e00c 100644
--- a/python/source/data.rst
+++ b/python/source/data.rst
@@ -166,12 +166,16 @@ We can combine them into a single table using 
:func:`pyarrow.concat_tables`:
 
   oscar_nominations = pa.concat_tables([oscar_nominations_1, 
                                         oscar_nominations_2])
-
-  print(oscar_nominations.to_pydict())
+  print(oscar_nominations)
 
 .. testoutput::
 
-  {'actor': ['Meryl Streep', 'Katharine Hepburn', 'Jack Nicholson', 'Bette 
Davis'], 'nominations': [21, 12, 12, 10]}
+    pyarrow.Table
+    actor: string
+    nominations: int64
+    ----
+    actor: [["Meryl Streep","Katharine Hepburn"],["Jack Nicholson","Bette 
Davis"]]
+    nominations: [[21,12],[12,10]]
 
 .. note::
 
@@ -203,9 +207,12 @@ Suppose we have a table with oscar nominations for each 
actress
 
 .. testoutput::
 
-  pyarrow.Table
-  actor: string
-  nominations: int64
+    pyarrow.Table
+    actor: string
+    nominations: int64
+    ----
+    actor: [["Meryl Streep","Katharine Hepburn"]]
+    nominations: [[21,12]]
 
 it's possible to append an additional column to track the years the
 nomination was won using :meth:`pyarrow.Table.append_column`
@@ -224,11 +231,15 @@ nomination was won using 
:meth:`pyarrow.Table.append_column`
 
 .. testoutput::
 
-  pyarrow.Table
-  actor: string
-  nominations: int64
-  wonyears: list<item: int64>
-    child 0, item: int64
+    pyarrow.Table
+    actor: string
+    nominations: int64
+    wonyears: list<item: int64>
+      child 0, item: int64
+    ----
+    actor: [["Meryl Streep","Katharine Hepburn"]]
+    nominations: [[21,12]]
+    wonyears: [[[1980,1983,2012],[1934,1968,1969,1982]]]
 
 Searching for values matching a predicate in Arrays
 ===================================================
diff --git a/python/source/io.rst b/python/source/io.rst
index 1071be5..2a8e128 100755
--- a/python/source/io.rst
+++ b/python/source/io.rst
@@ -67,14 +67,12 @@ the parquet file as :class:`ChunkedArray`
 
     print(table)
 
-    col1 = table["col1"]
-    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
-
 .. testoutput::
 
     pyarrow.Table
     col1: int64
-    ChunkedArray = 0 .. 99
+    ----
+    col1: [[0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99]]
 
 Reading a subset of Parquet data
 ================================
@@ -102,15 +100,12 @@ documentation for details about the syntax for filters.
 
     print(table)
 
-    col1 = table["col1"]
-    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
-
 .. testoutput::
 
     pyarrow.Table
     col1: int64
-    ChunkedArray = 6 .. 9
-    
+    ----
+    col1: [[6,7,8,9]]
 
 Saving Arrow Arrays to disk
 ===========================
@@ -228,14 +223,12 @@ provided to :func:`pyarrow.csv.read_csv` to drive
 
     print(table)
 
-    col1 = table["col1"]
-    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
-
 .. testoutput::
 
     pyarrow.Table
     col1: int64
-    ChunkedArray = 0 .. 99
+    ----
+    col1: [[0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99]]
 
 Writing Partitioned Datasets 
 ============================
@@ -286,15 +279,15 @@ column each with a file containing the subset of the data 
for that partition:
 .. testoutput::
 
     ./partitioned/2000/part-0.parquet
-    ./partitioned/2001/part-1.parquet
-    ./partitioned/2002/part-2.parquet
-    ./partitioned/2003/part-3.parquet
-    ./partitioned/2004/part-4.parquet
-    ./partitioned/2005/part-6.parquet
-    ./partitioned/2006/part-5.parquet
-    ./partitioned/2007/part-7.parquet
-    ./partitioned/2008/part-8.parquet
-    ./partitioned/2009/part-9.parquet
+    ./partitioned/2001/part-0.parquet
+    ./partitioned/2002/part-0.parquet
+    ./partitioned/2003/part-0.parquet
+    ./partitioned/2004/part-0.parquet
+    ./partitioned/2005/part-0.parquet
+    ./partitioned/2006/part-0.parquet
+    ./partitioned/2007/part-0.parquet
+    ./partitioned/2008/part-0.parquet
+    ./partitioned/2009/part-0.parquet
 
 Reading Partitioned data
 ========================
@@ -354,14 +347,12 @@ expose them as a single Table.
     table = dataset.to_table()
     print(table)
 
-    col1 = table["col1"]
-    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
-
 .. testoutput::
 
     pyarrow.Table
     col1: int64
-    ChunkedArray = 0 .. 29
+    ----
+    col1: 
[[0,1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18,19],[20,21,22,23,24,25,26,27,28,29]]
 
 Notice that converting to a table will force all data to be loaded 
 in memory.  For big datasets is usually not what you want.
@@ -533,14 +524,12 @@ the parquet file as :class:`ChunkedArray`
 
     print(table)
 
-    col1 = table["col1"]
-    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
-
 .. testoutput::
 
     pyarrow.Table
     col1: int64
-    ChunkedArray = 0 .. 99
+    ----
+    col1: [[0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99]]
 
 Reading Line Delimited JSON
 ===========================
@@ -650,6 +639,8 @@ by simply invoking :meth:`pyarrow.feather.read_table` and
 
     pyarrow.Table
     numbers: int64
+    ----
+    numbers: [[1,2,3,4,5]]
 
 .. testcode::
 
@@ -660,6 +651,8 @@ by simply invoking :meth:`pyarrow.feather.read_table` and
 
     pyarrow.Table
     numbers: int64
+    ----
+    numbers: [[1,2,3,4,5]]
 
 Reading data from formats that don't have native support for
 compression instead involves decompressing them before decoding them.
@@ -679,6 +672,8 @@ For example to read a compressed CSV file:
 
     pyarrow.Table
     numbers: int64
+    ----
+    numbers: [[1,2,3,4,5]]
 
 .. note::
 
@@ -696,3 +691,5 @@ For example to read a compressed CSV file:
 
     pyarrow.Table
     numbers: int64
+    ----
+    numbers: [[1,2,3,4,5]]
diff --git a/python/source/schema.rst b/python/source/schema.rst
index dcede35..141cb0e 100644
--- a/python/source/schema.rst
+++ b/python/source/schema.rst
@@ -87,6 +87,10 @@ The schema can then be provided to a table when created:
     col1: int8
     col2: string
     col3: double
+    ----
+    col1: [[1,2,3,4,5]]
+    col2: [["a","b","c","d","e"]]
+    col3: [[1,2,3,4,5]]
 
 Like for arrays, it's possible to cast tables to different schemas
 as far as they are compatible
@@ -109,6 +113,10 @@ as far as they are compatible
     col1: int32
     col2: string
     col3: double
+    ----
+    col1: [[1,2,3,4,5]]
+    col2: [["a","b","c","d","e"]]
+    col3: [[1,2,3,4,5]]
 
 Merging multiple schemas
 ========================

[arrow-cookbook] branch main updated: Update python tests for version 6.0.0 (#98)

Reply via email to