jorisvandenbossche commented on a change in pull request #9725: URL: https://github.com/apache/arrow/pull/9725#discussion_r597149363
########## File path: cpp/src/arrow/dataset/file_csv.h ########## @@ -38,6 +38,13 @@ class ARROW_DS_EXPORT CsvFileFormat : public FileFormat { public: /// Options affecting the parsing of CSV files csv::ParseOptions parse_options = csv::ParseOptions::Defaults(); + /// Number of header rows to skip (see arrow::csv::ReadOptions::skip_rows) + int32_t skip_rows = 0; + /// Column names for the target table (see arrow::csv::ReadOptions::column_names) + std::vector<std::string> column_names; Review comment: AFAIK, `column_names` is not meant to replace existing columns (and thus do an implicit renaming), it's to specify column names if they are not present in the file. That way, I think it makes a lot of sense to include it here. If you have a set of regular CSV files (all having the same number of columns, in the same order, etc) that don't have names embedded, you would otherwise not be able to read those using the Datasets API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org