Weston Pace created ARROW-13580:
-----------------------------------

             Summary: [C++] quoted_strings_can_be_null only applied to string 
columns
                 Key: ARROW-13580
                 URL: https://issues.apache.org/jira/browse/ARROW-13580
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Weston Pace


My interpretation of the "string" in quoted_strings_can_be_null is that it is 
referring to the unparsed CSV input string and not the actual output data type.

So when converting:

{code:csv}
"one","two","three"
"1","2","3"
"4","","6"'
{code}


We should get...
[1, 4], [2, None], [3, 6]

...currently we get...
[1, 4], ['2', None], [3, 6]

In pandas the above string parses to...

{code:python}
>>> f = io.BytesIO(b'"one","two","three"\n"1","2","3"\n"4","","6"')
>>> pandas.read_csv(f)
   one  two  three
0    1  2.0      3
1    4  NaN      6
{code}

So this is bringing us closer to pandas which is probably a good thing.

Inspired by: https://github.com/apache/arrow/issues/10892 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to