[GitHub] [arrow-cookbook] amol- commented on a change in pull request #1: Initial content for Arrow Cookbook for Python and R

GitBox Mon, 26 Jul 2021 04:01:51 -0700


amol- commented on a change in pull request #1:
URL: https://github.com/apache/arrow-cookbook/pull/1#discussion_r676503310




##########
File path: python/source/io.rst
##########
@@ -0,0 +1,424 @@
+========================
+Reading and Writing Data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Write a Parquet file
+====================
+
+.. testsetup::
+
+    import numpy as np
+    import pyarrow as pa
+
+    arr = pa.array(np.arange(100))
+
+Given an array with all numbers from 0 to 100
+
+.. testcode::
+
+    print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+    0 .. 99
+
+To write it to a Parquet file, as Parquet is a columnar format,
+we must create a :class:`pyarrow.Table` out of it,
+so that we get a table of a single column which can then be
+written to a Parquet file. 
+
+.. testcode::
+
+    table = pa.Table.from_arrays([arr], names=["col1"])
+
+Once we have a table, it can be written to a Parquet File 
+using the functions provided by the ``pyarrow.parquet`` module
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    pq.write_table(table, "example.parquet", compression=None)
+
+Reading a Parquet file
+======================
+
+Given a Parquet file, it can be read back to a :class:`pyarrow.Table`
+by using :func:`pyarrow.parquet.read_table` function
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    table = pq.read_table("example.parquet")
+
+The resulting table will contain the same columns that existed in
+the parquet file as :class:`ChunkedArray`
+
+.. testcode::
+
+    print(table)
+
+    col1 = table["col1"]
+    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    ChunkedArray = 0 .. 99
+
+Reading a subset of Parquet data
+================================
+
+When reading a Parquet file with :func:`pyarrow.parquet.read_table` 
+it is possible to restrict which Columns and Rows will be read
+into memory by using the ``filters`` and ``columns`` arguments
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    table = pq.read_table("example.parquet", 
+                          columns=["col1"],
+                          filters=[
+                              ("col1", ">", 5),
+                              ("col1", "<", 10),
+                          ])
+
+The resulting table will contain only the projected columns
+and filtered rows. Refer to :func:`pyarrow.parquet.read_table`
+documentation for details about the syntax for filters.
+
+.. testcode::
+
+    print(table)
+
+    col1 = table["col1"]
+    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    ChunkedArray = 6 .. 9
+    
+
+Saving Arrow Arrays to disk
+===========================
+
+Apart from using arrow to read and save common file formats like Parquet,
+it is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format.
+
+Given an array with all numbers from 0 to 100

Review comment:
       rephrased




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-cookbook] amol- commented on a change in pull request #1: Initial content for Arrow Cookbook for Python and R

Reply via email to