This is an automated email from the ASF dual-hosted git repository.
thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git
The following commit(s) were added to refs/heads/main by this push:
new 42507ed Update python tests for version 6.0.0 (#98)
42507ed is described below
commit 42507edcbda72dd11beb5c085847a58d289fad64
Author: Alessandro Molina <[email protected]>
AuthorDate: Thu Oct 28 14:18:59 2021 +0200
Update python tests for version 6.0.0 (#98)
* Update python tests for version 6.0.0
* Adapt to new partitioning convention
---
python/source/create.rst | 10 +++++++++
python/source/data.rst | 33 ++++++++++++++++++----------
python/source/io.rst | 57 +++++++++++++++++++++++-------------------------
python/source/schema.rst | 8 +++++++
4 files changed, 67 insertions(+), 41 deletions(-)
diff --git a/python/source/create.rst b/python/source/create.rst
index 28773d9..b9ca220 100644
--- a/python/source/create.rst
+++ b/python/source/create.rst
@@ -94,6 +94,10 @@ by pairing multiple arrays with names for their columns
col1: int64
col2: string
col3: double
+ ----
+ col1: [[1,2,3,4,5]]
+ col2: [["a","b","c","d","e"]]
+ col3: [[1,2,3,4,5]]
Create Table from Plain Types
=============================
@@ -122,6 +126,9 @@ from a variety of inputs, including plain python objects
pyarrow.Table
col1: int64
col2: string
+ ----
+ col1: [[1,2,3,4,5]]
+ col2: [["a","b","c","d","e"]]
.. note::
@@ -167,6 +174,9 @@ Multiple batches can be combined into a table using
pyarrow.Table
odd: int64
even: int64
+ ----
+ odd: [[1,3,5,7,9],[11,13,15,17,19]]
+ even: [[2,4,6,8,10],[12,14,16,18,20]]
Equally, :class:`pyarrow.Table` can be converted to a list of
:class:`pyarrow.RecordBatch` using the :meth:`pyarrow.Table.to_batches`
diff --git a/python/source/data.rst b/python/source/data.rst
index f00eed1..970e00c 100644
--- a/python/source/data.rst
+++ b/python/source/data.rst
@@ -166,12 +166,16 @@ We can combine them into a single table using
:func:`pyarrow.concat_tables`:
oscar_nominations = pa.concat_tables([oscar_nominations_1,
oscar_nominations_2])
-
- print(oscar_nominations.to_pydict())
+ print(oscar_nominations)
.. testoutput::
- {'actor': ['Meryl Streep', 'Katharine Hepburn', 'Jack Nicholson', 'Bette
Davis'], 'nominations': [21, 12, 12, 10]}
+ pyarrow.Table
+ actor: string
+ nominations: int64
+ ----
+ actor: [["Meryl Streep","Katharine Hepburn"],["Jack Nicholson","Bette
Davis"]]
+ nominations: [[21,12],[12,10]]
.. note::
@@ -203,9 +207,12 @@ Suppose we have a table with oscar nominations for each
actress
.. testoutput::
- pyarrow.Table
- actor: string
- nominations: int64
+ pyarrow.Table
+ actor: string
+ nominations: int64
+ ----
+ actor: [["Meryl Streep","Katharine Hepburn"]]
+ nominations: [[21,12]]
it's possible to append an additional column to track the years the
nomination was won using :meth:`pyarrow.Table.append_column`
@@ -224,11 +231,15 @@ nomination was won using
:meth:`pyarrow.Table.append_column`
.. testoutput::
- pyarrow.Table
- actor: string
- nominations: int64
- wonyears: list<item: int64>
- child 0, item: int64
+ pyarrow.Table
+ actor: string
+ nominations: int64
+ wonyears: list<item: int64>
+ child 0, item: int64
+ ----
+ actor: [["Meryl Streep","Katharine Hepburn"]]
+ nominations: [[21,12]]
+ wonyears: [[[1980,1983,2012],[1934,1968,1969,1982]]]
Searching for values matching a predicate in Arrays
===================================================
diff --git a/python/source/io.rst b/python/source/io.rst
index 1071be5..2a8e128 100755
--- a/python/source/io.rst
+++ b/python/source/io.rst
@@ -67,14 +67,12 @@ the parquet file as :class:`ChunkedArray`
print(table)
- col1 = table["col1"]
- print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
-
.. testoutput::
pyarrow.Table
col1: int64
- ChunkedArray = 0 .. 99
+ ----
+ col1: [[0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99]]
Reading a subset of Parquet data
================================
@@ -102,15 +100,12 @@ documentation for details about the syntax for filters.
print(table)
- col1 = table["col1"]
- print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
-
.. testoutput::
pyarrow.Table
col1: int64
- ChunkedArray = 6 .. 9
-
+ ----
+ col1: [[6,7,8,9]]
Saving Arrow Arrays to disk
===========================
@@ -228,14 +223,12 @@ provided to :func:`pyarrow.csv.read_csv` to drive
print(table)
- col1 = table["col1"]
- print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
-
.. testoutput::
pyarrow.Table
col1: int64
- ChunkedArray = 0 .. 99
+ ----
+ col1: [[0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99]]
Writing Partitioned Datasets
============================
@@ -286,15 +279,15 @@ column each with a file containing the subset of the data
for that partition:
.. testoutput::
./partitioned/2000/part-0.parquet
- ./partitioned/2001/part-1.parquet
- ./partitioned/2002/part-2.parquet
- ./partitioned/2003/part-3.parquet
- ./partitioned/2004/part-4.parquet
- ./partitioned/2005/part-6.parquet
- ./partitioned/2006/part-5.parquet
- ./partitioned/2007/part-7.parquet
- ./partitioned/2008/part-8.parquet
- ./partitioned/2009/part-9.parquet
+ ./partitioned/2001/part-0.parquet
+ ./partitioned/2002/part-0.parquet
+ ./partitioned/2003/part-0.parquet
+ ./partitioned/2004/part-0.parquet
+ ./partitioned/2005/part-0.parquet
+ ./partitioned/2006/part-0.parquet
+ ./partitioned/2007/part-0.parquet
+ ./partitioned/2008/part-0.parquet
+ ./partitioned/2009/part-0.parquet
Reading Partitioned data
========================
@@ -354,14 +347,12 @@ expose them as a single Table.
table = dataset.to_table()
print(table)
- col1 = table["col1"]
- print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
-
.. testoutput::
pyarrow.Table
col1: int64
- ChunkedArray = 0 .. 29
+ ----
+ col1:
[[0,1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18,19],[20,21,22,23,24,25,26,27,28,29]]
Notice that converting to a table will force all data to be loaded
in memory. For big datasets is usually not what you want.
@@ -533,14 +524,12 @@ the parquet file as :class:`ChunkedArray`
print(table)
- col1 = table["col1"]
- print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
-
.. testoutput::
pyarrow.Table
col1: int64
- ChunkedArray = 0 .. 99
+ ----
+ col1: [[0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99]]
Reading Line Delimited JSON
===========================
@@ -650,6 +639,8 @@ by simply invoking :meth:`pyarrow.feather.read_table` and
pyarrow.Table
numbers: int64
+ ----
+ numbers: [[1,2,3,4,5]]
.. testcode::
@@ -660,6 +651,8 @@ by simply invoking :meth:`pyarrow.feather.read_table` and
pyarrow.Table
numbers: int64
+ ----
+ numbers: [[1,2,3,4,5]]
Reading data from formats that don't have native support for
compression instead involves decompressing them before decoding them.
@@ -679,6 +672,8 @@ For example to read a compressed CSV file:
pyarrow.Table
numbers: int64
+ ----
+ numbers: [[1,2,3,4,5]]
.. note::
@@ -696,3 +691,5 @@ For example to read a compressed CSV file:
pyarrow.Table
numbers: int64
+ ----
+ numbers: [[1,2,3,4,5]]
diff --git a/python/source/schema.rst b/python/source/schema.rst
index dcede35..141cb0e 100644
--- a/python/source/schema.rst
+++ b/python/source/schema.rst
@@ -87,6 +87,10 @@ The schema can then be provided to a table when created:
col1: int8
col2: string
col3: double
+ ----
+ col1: [[1,2,3,4,5]]
+ col2: [["a","b","c","d","e"]]
+ col3: [[1,2,3,4,5]]
Like for arrays, it's possible to cast tables to different schemas
as far as they are compatible
@@ -109,6 +113,10 @@ as far as they are compatible
col1: int32
col2: string
col3: double
+ ----
+ col1: [[1,2,3,4,5]]
+ col2: [["a","b","c","d","e"]]
+ col3: [[1,2,3,4,5]]
Merging multiple schemas
========================