[PR] GH-47897: [C++][Python] Allow default column type for CSV columns [arrow]

via GitHub Tue, 21 Oct 2025 09:22:24 -0700


cottrell opened a new pull request, #47898:
URL: https://github.com/apache/arrow/pull/47898


     ### Rationale for this change
     Users who want to coerce every CSV column to a single type currently have 
to pre-compute a schema or enumerate column names. Adding a default type on 
`ConvertOptions` removes that friction (e.g. “read everything as string”).
   
     ### What changes are included in this PR?
     - add `ConvertOptions::column_type` in C++ and honor it in the CSV reader 
when no per-column mapping exists
     - expose the knob as `csv.ConvertOptions(column_type=…)` in PyArrow, with 
documentation updates
     - extend the Python CSV tests to cover string, integer, and float 
defaults, including `include_missing_columns`
   
     ### Are these changes tested?
     - `make test-csv`
     - `pytest python/pyarrow/tests/test_cpp_internals.py`
     - a full `pytest python/pyarrow` attempt times out in the dataset 
backpressure test on this environment (documented limitation)
   
     ### Are there any user-facing changes?
     - new public `column_type` parameter on `pyarrow.csv.ConvertOptions`
     - documentation additions showing how to set a single default type
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] GH-47897: [C++][Python] Allow default column type for CSV columns [arrow]

Reply via email to