pitrou commented on code in PR #14722:
URL: https://github.com/apache/arrow/pull/14722#discussion_r1031580606
##########
python/pyarrow/tests/test_csv.py:
##########
@@ -1908,6 +1908,20 @@ def test_write_read_round_trip():
parse_options=parse_options)
+def test_write_quoting_style():
+ t = pa.Table.from_arrays([[1, 2, 3], ["a", "b", "c"]], ["c1", "c2"])
+ buf = io.BytesIO()
+ for write_options, res in [
+ (WriteOptions(quoting_style='none'), b'"c1","c2"\n1,a\n2,b\n3,c\n'),
+ (WriteOptions(), b'"c1","c2"\n1,"a"\n2,"b"\n3,"c"\n'),
+ (WriteOptions(quoting_style='all_valid'),
b'"c1","c2"\n"1","a"\n"2","b"\n"3","c"\n'),
+ ]:
+ with CSVWriter(buf, t.schema, write_options=write_options) as writer:
+ writer.write_table(t)
+ buf.seek(0)
+ assert buf.getvalue() == res
+
Review Comment:
Perhaps also test with special characters? This would probably raise with
`quoting_style='none'`
##########
python/pyarrow/tests/test_csv.py:
##########
@@ -1908,6 +1908,20 @@ def test_write_read_round_trip():
parse_options=parse_options)
+def test_write_quoting_style():
+ t = pa.Table.from_arrays([[1, 2, 3], ["a", "b", "c"]], ["c1", "c2"])
+ buf = io.BytesIO()
+ for write_options, res in [
+ (WriteOptions(quoting_style='none'), b'"c1","c2"\n1,a\n2,b\n3,c\n'),
+ (WriteOptions(), b'"c1","c2"\n1,"a"\n2,"b"\n3,"c"\n'),
+ (WriteOptions(quoting_style='all_valid'),
b'"c1","c2"\n"1","a"\n"2","b"\n"3","c"\n'),
+ ]:
+ with CSVWriter(buf, t.schema, write_options=write_options) as writer:
+ writer.write_table(t)
+ buf.seek(0)
Review Comment:
No need for `seek(0)` before `getvalue()` AFAIR.
##########
python/pyarrow/_csv.pyx:
##########
@@ -1288,20 +1301,25 @@ cdef class WriteOptions(_Weakrefable):
CSV data
delimiter : 1-character string, optional (default ",")
The character delimiting individual cells in the CSV data.
+ quoting_style : str, optional (default "needed")
+ Whether to quote values, and if so, which quoting style to use.
+ Accepted values are "needed", "all_valid", "none"
Review Comment:
We would probably like to expand the explanation a bit, for example:
```suggestion
The following values are accepted:
- "needed" (default): only enclose values in quotes when needed.
- "all_valid": enclose all valid values in quotes; nulls are not
quoted.
- "none": do not enclose any values in quotes; values containing
special characters (such as quotes, cell delimiters or line
endings)
will raise an error.
```
##########
python/pyarrow/_csv.pyx:
##########
@@ -1337,6 +1355,17 @@ cdef class WriteOptions(_Weakrefable):
def delimiter(self, value):
deref(self.options).delimiter = _single_char(value)
+ @property
+ def quoting_style(self):
+ """
+ Quoting style for CSV writing.
+ """
+ return deref(self.options).quoting_style
Review Comment:
Does this actually return the same values as expected by the constructor and
property setter?
##########
python/pyarrow/tests/test_csv.py:
##########
@@ -1908,6 +1908,20 @@ def test_write_read_round_trip():
parse_options=parse_options)
+def test_write_quoting_style():
+ t = pa.Table.from_arrays([[1, 2, 3], ["a", "b", "c"]], ["c1", "c2"])
Review Comment:
Let's add some null values there?
##########
python/pyarrow/_csv.pyx:
##########
@@ -1337,6 +1355,17 @@ cdef class WriteOptions(_Weakrefable):
def delimiter(self, value):
deref(self.options).delimiter = _single_char(value)
+ @property
+ def quoting_style(self):
+ """
+ Quoting style for CSV writing.
Review Comment:
We would rather reuse the same description as in the `WriteOptions`
docstring above.
##########
python/pyarrow/tests/test_csv.py:
##########
@@ -1908,6 +1908,20 @@ def test_write_read_round_trip():
parse_options=parse_options)
+def test_write_quoting_style():
+ t = pa.Table.from_arrays([[1, 2, 3], ["a", "b", "c"]], ["c1", "c2"])
Review Comment:
For example:
```suggestion
t = pa.Table.from_arrays([[1, 2, None], ["a", None, "c"]], ["c1", "c2"])
```
##########
python/pyarrow/tests/test_csv.py:
##########
@@ -1908,6 +1908,20 @@ def test_write_read_round_trip():
parse_options=parse_options)
+def test_write_quoting_style():
Review Comment:
Can you also expand `test_write_options` on line 324 above?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]