bkietz commented on a change in pull request #9725: URL: https://github.com/apache/arrow/pull/9725#discussion_r597084271
########## File path: cpp/src/arrow/dataset/file_csv.h ########## @@ -38,6 +38,13 @@ class ARROW_DS_EXPORT CsvFileFormat : public FileFormat { public: /// Options affecting the parsing of CSV files csv::ParseOptions parse_options = csv::ParseOptions::Defaults(); + /// Number of header rows to skip (see arrow::csv::ReadOptions::skip_rows) + int32_t skip_rows = 0; + /// Column names for the target table (see arrow::csv::ReadOptions::column_names) + std::vector<std::string> column_names; Review comment: I'm still -0 on including this option. It makes sense for a single file reader but I don't think datasets needs to provide two approaches for renaming columns. By contrast, skip_rows and autogenerate are necessary here to accommodate the cases where files - have non-csv front matter like a license comment which we shouldn't attempt to parse - have data in their first row which we shouldn't interpret as column names respectively. @jorisvandenbossche @nealrichardson what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org