[ https://issues.apache.org/jira/browse/ARROW-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823789#comment-16823789 ]
Antoine Pitrou commented on ARROW-5195: --------------------------------------- Right, we should add an option for this. Currently any string is a valid string value. > [Python] read_csv ignores null_values on string types > ----------------------------------------------------- > > Key: ARROW-5195 > URL: https://issues.apache.org/jira/browse/ARROW-5195 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.13.0 > Environment: Python 3.6, PyArrow 0.13.0, AWS linux, debian-slim in > docker > Reporter: Scott Burns > Priority: Minor > Fix For: 0.14.0 > > > Let's write a simple CSV with NULL values in a string column: > {quote}with open('foo.csv', 'w') as fobj: > fobj.write('col1,col2\n1,value\n2,NULL') > table = csv.read_csv('foo.csv') > table.column('col2').null_count # => 0 > {quote} > > table.column('col2').null_count will be 0, I think it should be 1. Passing > in {{ConvertOptions(null_values=["NULL"])}} doesn't help. > > Note that {{pandas.read_csv}} parses these NULLs correctly so I have a > workaround available. > But I'd prefer to natively read CSV from pyarrow if possible :) -- This message was sent by Atlassian JIRA (v7.6.3#76005)