[GitHub] [arrow-cookbook] westonpace commented on a change in pull request #50: Update CSV recipe to use pyarrow.csv instead of pandas

GitBox Fri, 27 Aug 2021 14:40:39 -0700


westonpace commented on a change in pull request #50:
URL: https://github.com/apache/arrow-cookbook/pull/50#discussion_r697731958




##########
File path: python/source/io.rst
##########
@@ -180,13 +177,36 @@ Writing CSV files
 =================
 
 It is currently possible to write an Arrow :class:`pyarrow.Table` to
-CSV by going through pandas. Arrow doesn't currently provide an optimized
-code path for writing to CSV.
+a CSV file using the :func:`pyarrow.csv.write_csv` function
 
 .. testcode::
 
+    arr = pa.array(range(100))
     table = pa.Table.from_arrays([arr], names=["col1"])
-    table.to_pandas().to_csv("table.csv", index=False)
+    
+    import pyarrow.csv
+    pa.csv.write_csv(table, "table.csv",
+                     write_options=pa.csv.WriteOptions(include_header=True))

Review comment:
       There has been some movement here: 
https://github.com/apache/arrow-cookbook/pull/2 to avoid `testsetup` (which is 
hidden) in favor of fully standalone `testcode` blocks (at the risk of 
duplication).

##########
File path: python/source/io.rst
##########
@@ -180,13 +177,36 @@ Writing CSV files
 =================
 
 It is currently possible to write an Arrow :class:`pyarrow.Table` to
-CSV by going through pandas. Arrow doesn't currently provide an optimized
-code path for writing to CSV.
+a CSV file using the :func:`pyarrow.csv.write_csv` function
 
 .. testcode::
 
+    arr = pa.array(range(100))
     table = pa.Table.from_arrays([arr], names=["col1"])
-    table.to_pandas().to_csv("table.csv", index=False)
+    
+    import pyarrow.csv
+    pa.csv.write_csv(table, "table.csv",
+                     write_options=pa.csv.WriteOptions(include_header=True))
+
+Writing CSV files incrementally
+===============================
+
+If you need to append to write data to a CSV file incrementally
+as you generate or retrieve the data and you don't want to keep
+in memory the whole table to write it at once, it's possible to use
+:class:`pyarrow.csv.CSVWriter` to write data incrementally
+
+.. testcode::
+
+    schema = pa.schema([("col1", pa.int32())])
+    with pa.csv.CSVWriter("table.csv", schema=schema) as writer:
+        for chunk in range(10):
+            datachunk = range(chunk*10, (chunk+1)*10)
+            table = pa.Table.from_arrays([pa.array(datachunk)], schema=schema)
+            writer.write(table)
+
+Apart tables, it's equally possible to write :class:`pyarrow.RecordBatch`
+just passing them as you would for tables.

Review comment:
       ```suggestion
   It's equally possible to write :class:`pyarrow.RecordBatch`
   by passing them as you would for tables.
   ```

##########
File path: python/source/io.rst
##########
@@ -180,13 +177,36 @@ Writing CSV files
 =================
 
 It is currently possible to write an Arrow :class:`pyarrow.Table` to
-CSV by going through pandas. Arrow doesn't currently provide an optimized
-code path for writing to CSV.
+a CSV file using the :func:`pyarrow.csv.write_csv` function
 
 .. testcode::
 
+    arr = pa.array(range(100))
     table = pa.Table.from_arrays([arr], names=["col1"])
-    table.to_pandas().to_csv("table.csv", index=False)
+    
+    import pyarrow.csv
+    pa.csv.write_csv(table, "table.csv",
+                     write_options=pa.csv.WriteOptions(include_header=True))
+
+Writing CSV files incrementally
+===============================
+
+If you need to append to write data to a CSV file incrementally

Review comment:
       ```suggestion
   If you need to write data to a CSV file incrementally
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-cookbook] westonpace commented on a change in pull request #50: Update CSV recipe to use pyarrow.csv instead of pandas

Reply via email to