thisisnic commented on a change in pull request #87:
URL: https://github.com/apache/arrow-cookbook/pull/87#discussion_r733458521
##########
File path: python/source/io.rst
##########
@@ -577,4 +577,121 @@ The content of the file can be read back to a
:class:`pyarrow.Table` using
.. testoutput::
- {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
\ No newline at end of file
+ {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
+
+Writing Compressed Data
+=======================
+
+Arrow provides support for writing files in compressed format,
+both for formats that provide it natively like Parquet or Feather,
+and for formats that don't support it out of the box like CSV.
+
+Given a table:
+
+.. testcode::
+
+ table = pa.table([
+ pa.array([1, 2, 3, 4, 5])
+ ], names=["numbers"])
+
+Writing it compressed to parquet or feather requires passing the
+``compression`` argument to the :func:`pyarrow.feather.write_feather` and
+:func:`pyarrow.parquet.write_table` functions:
+
+.. testcode::
+
+ pa.feather.write_feather(table, "compressed.feather",
+ compression="lz4")
+ pa.parquet.write_table(table, "compressed.parquet",
+ compression="lz4")
+
+You can refer to the two functions documentation for a complete
Review comment:
```suggestion
You can refer to each of those functions' documentation for a complete
```
##########
File path: python/source/io.rst
##########
@@ -577,4 +577,121 @@ The content of the file can be read back to a
:class:`pyarrow.Table` using
.. testoutput::
- {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
\ No newline at end of file
+ {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
+
+Writing Compressed Data
+=======================
+
+Arrow provides support for writing files in compressed format,
+both for formats that provide it natively like Parquet or Feather,
Review comment:
```suggestion
both for formats that provide compression natively like Parquet or Feather,
```
##########
File path: python/source/io.rst
##########
@@ -577,4 +577,121 @@ The content of the file can be read back to a
:class:`pyarrow.Table` using
.. testoutput::
- {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
\ No newline at end of file
+ {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
+
+Writing Compressed Data
+=======================
+
+Arrow provides support for writing files in compressed format,
+both for formats that provide it natively like Parquet or Feather,
+and for formats that don't support it out of the box like CSV.
+
+Given a table:
+
+.. testcode::
+
+ table = pa.table([
+ pa.array([1, 2, 3, 4, 5])
+ ], names=["numbers"])
+
+Writing it compressed to parquet or feather requires passing the
+``compression`` argument to the :func:`pyarrow.feather.write_feather` and
+:func:`pyarrow.parquet.write_table` functions:
+
+.. testcode::
+
+ pa.feather.write_feather(table, "compressed.feather",
+ compression="lz4")
+ pa.parquet.write_table(table, "compressed.parquet",
+ compression="lz4")
+
+You can refer to the two functions documentation for a complete
+list of supported compression formats.
+
+.. note::
+
+ Arrow actually uses compression by default when writing
+ parquet or feather files. Feather is compressed using ``lz4``
+ by default and Parquet uses ``snappy`` by default.
+
+For formats that don't support compression natively, like CSV,
+it's possible to save compressed data using
+:class:`pyarrow.CompressedOutputStream`:
+
+.. testcode::
+
+ with pa.CompressedOutputStream("compressed.csv.gz", "gzip") as out:
+ pa.csv.write_csv(table, out)
+
+This requires decompressing the file when reading it back,
+which can be done using :class:`pyarrow.CompressedInputStream`
+as explained in the next recipe.
+
+Reading Compressed Data
+=======================
+
+Arrow provides support for reading compressed files,
+both for formats that provide it natively like Parquet or Feather,
+and for formats that don't support it out of the box like CSV.
Review comment:
I was thinking this could be a tiny bit clearer, though I think the
"external application" phrasing needs tweaking a bit.
```
Arrow provides support for reading compressed files,
both for formats that provide it natively like Parquet or Feather,
and for files in formats that don't support compression natively,
like CSV, but have been compressed by an external application.
```
##########
File path: python/source/io.rst
##########
@@ -577,4 +577,121 @@ The content of the file can be read back to a
:class:`pyarrow.Table` using
.. testoutput::
- {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
\ No newline at end of file
+ {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
+
+Writing Compressed Data
+=======================
+
+Arrow provides support for writing files in compressed format,
+both for formats that provide it natively like Parquet or Feather,
+and for formats that don't support it out of the box like CSV.
+
+Given a table:
+
+.. testcode::
+
+ table = pa.table([
+ pa.array([1, 2, 3, 4, 5])
+ ], names=["numbers"])
+
+Writing it compressed to parquet or feather requires passing the
Review comment:
```suggestion
Writing it compressed to Parquet or Feather requires passing the
```
##########
File path: python/source/io.rst
##########
@@ -577,4 +577,121 @@ The content of the file can be read back to a
:class:`pyarrow.Table` using
.. testoutput::
- {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
\ No newline at end of file
+ {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
+
+Writing Compressed Data
+=======================
+
+Arrow provides support for writing files in compressed format,
Review comment:
```suggestion
Arrow provides support for writing files in compressed formats,
```
##########
File path: python/source/io.rst
##########
@@ -577,4 +577,121 @@ The content of the file can be read back to a
:class:`pyarrow.Table` using
.. testoutput::
- {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
\ No newline at end of file
+ {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
+
+Writing Compressed Data
+=======================
+
+Arrow provides support for writing files in compressed format,
+both for formats that provide it natively like Parquet or Feather,
+and for formats that don't support it out of the box like CSV.
Review comment:
```suggestion
and for formats that don't support compression out of the box like CSV.
```
##########
File path: python/source/io.rst
##########
@@ -577,4 +577,121 @@ The content of the file can be read back to a
:class:`pyarrow.Table` using
.. testoutput::
- {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
\ No newline at end of file
+ {'a': [1, 3, 5, 7], 'b': [2.0, 3.0, 4.0, 5.0], 'c': [1, 2, 3, 4]}
+
+Writing Compressed Data
+=======================
+
+Arrow provides support for writing files in compressed format,
+both for formats that provide it natively like Parquet or Feather,
+and for formats that don't support it out of the box like CSV.
+
+Given a table:
+
+.. testcode::
+
+ table = pa.table([
+ pa.array([1, 2, 3, 4, 5])
+ ], names=["numbers"])
+
+Writing it compressed to parquet or feather requires passing the
+``compression`` argument to the :func:`pyarrow.feather.write_feather` and
+:func:`pyarrow.parquet.write_table` functions:
+
+.. testcode::
+
+ pa.feather.write_feather(table, "compressed.feather",
+ compression="lz4")
+ pa.parquet.write_table(table, "compressed.parquet",
+ compression="lz4")
+
+You can refer to the two functions documentation for a complete
+list of supported compression formats.
+
+.. note::
+
+ Arrow actually uses compression by default when writing
+ parquet or feather files. Feather is compressed using ``lz4``
Review comment:
```suggestion
Parquet or Feather files. Feather is compressed using ``lz4``
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]