Weston Pace created ARROW-13580:
-----------------------------------
Summary: [C++] quoted_strings_can_be_null only applied to string
columns
Key: ARROW-13580
URL: https://issues.apache.org/jira/browse/ARROW-13580
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Weston Pace
My interpretation of the "string" in quoted_strings_can_be_null is that it is
referring to the unparsed CSV input string and not the actual output data type.
So when converting:
{code:csv}
"one","two","three"
"1","2","3"
"4","","6"'
{code}
We should get...
[1, 4], [2, None], [3, 6]
...currently we get...
[1, 4], ['2', None], [3, 6]
In pandas the above string parses to...
{code:python}
>>> f = io.BytesIO(b'"one","two","three"\n"1","2","3"\n"4","","6"')
>>> pandas.read_csv(f)
one two three
0 1 2.0 3
1 4 NaN 6
{code}
So this is bringing us closer to pandas which is probably a good thing.
Inspired by: https://github.com/apache/arrow/issues/10892
--
This message was sent by Atlassian Jira
(v8.3.4#803005)