[GitHub] [arrow] houqp opened a new pull request #7238: ARROW-8877: [Rust] [DataFusion] introduce CsvReadOption struct to simplify UX for read CSV data

GitBox Wed, 20 May 2020 17:30:09 -0700


houqp opened a new pull request #7238:
URL: https://github.com/apache/arrow/pull/7238



   A new `CsvReadOptions` is introduced to capture the following CSV read 
configurations:
   
   * has_header
   * optional delimiter
   * optional schema
   * number of records to read for schema inference
   
   See changes in `rust/datafusion/examples/csv_sql.rs` for an example on how 
the new interface looks like.
   
   Initially I thought we can unify all CSV read code path using one single 
options struct. It turns out it's not possible. Components from low level of 
the stack including `arrow::csv::reader::Read`, `CsvParition`, `CsvIterator` 
all work under the assumption that schema has already been defined, which makes 
some of the fields in CsvReadOptions irrelevant.
   
   Therefore, the newly introduced CsvReadOptions struct is only used in high 
level user facing APIs including: `CsvFile`, `CsvExec`, 
`LogicalPlanBuilder::scan_csv`, `context::register_csv`, etc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] houqp opened a new pull request #7238: ARROW-8877: [Rust] [DataFusion] introduce CsvReadOption struct to simplify UX for read CSV data

Reply via email to