[jira] [Created] (ARROW-16834) Handle impossible conversions in csv.ConvertOptions

Tim Loderhose (Jira) Wed, 15 Jun 2022 06:28:06 -0700

Tim Loderhose created ARROW-16834:
-------------------------------------

             Summary: Handle impossible conversions in csv.ConvertOptions
                 Key: ARROW-16834
                 URL: https://issues.apache.org/jira/browse/ARROW-16834
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
    Affects Versions: 8.0.0
            Reporter: Tim Loderhose



https://arrow.apache.org/docs/python/generated/pyarrow.csv.ParseOptions.html#pyarrow.csv.ParseOptions
 allows for skipping invalid rows by means of the `invalid_row_handler`.

In 
https://arrow.apache.org/docs/python/generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions,
 one can supply a schema to get correct types in the resulting table.
I have a data source that almost always follows a specific schema, but its data 
isn't validated beforehand. In practice, it's possible for a field which is 
int16 99.9% of the time to have an out-of-range value in a few rows.

I'd like to handle those cases similarly to the `invalid_row_handler`, perhaps 
allowing to set failing conversions to NULL, or supplying a handler to apply a 
more specific operation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (ARROW-16834) Handle impossible conversions in csv.ConvertOptions

Reply via email to