houqp opened a new pull request #7238: URL: https://github.com/apache/arrow/pull/7238
A new `CsvReadOptions` is introduced to capture the following CSV read configurations: * has_header * optional delimiter * optional schema * number of records to read for schema inference See changes in `rust/datafusion/examples/csv_sql.rs` for an example on how the new interface looks like. Initially I thought we can unify all CSV read code path using one single options struct. It turns out it's not possible. Components from low level of the stack including `arrow::csv::reader::Read`, `CsvParition`, `CsvIterator` all work under the assumption that schema has already been defined, which makes some of the fields in CsvReadOptions irrelevant. Therefore, the newly introduced CsvReadOptions struct is only used in high level user facing APIs including: `CsvFile`, `CsvExec`, `LogicalPlanBuilder::scan_csv`, `context::register_csv`, etc. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
