cottrell opened a new pull request, #47898:
URL: https://github.com/apache/arrow/pull/47898
### Rationale for this change
Users who want to coerce every CSV column to a single type currently have
to pre-compute a schema or enumerate column names. Adding a default type on
`ConvertOptions` removes that friction (e.g. “read everything as string”).
### What changes are included in this PR?
- add `ConvertOptions::column_type` in C++ and honor it in the CSV reader
when no per-column mapping exists
- expose the knob as `csv.ConvertOptions(column_type=…)` in PyArrow, with
documentation updates
- extend the Python CSV tests to cover string, integer, and float
defaults, including `include_missing_columns`
### Are these changes tested?
- `make test-csv`
- `pytest python/pyarrow/tests/test_cpp_internals.py`
- a full `pytest python/pyarrow` attempt times out in the dataset
backpressure test on this environment (documented limitation)
### Are there any user-facing changes?
- new public `column_type` parameter on `pyarrow.csv.ConvertOptions`
- documentation additions showing how to set a single default type
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]