nealrichardson commented on a change in pull request #9725: URL: https://github.com/apache/arrow/pull/9725#discussion_r597104491
########## File path: cpp/src/arrow/dataset/file_csv.h ########## @@ -38,6 +38,13 @@ class ARROW_DS_EXPORT CsvFileFormat : public FileFormat { public: /// Options affecting the parsing of CSV files csv::ParseOptions parse_options = csv::ParseOptions::Defaults(); + /// Number of header rows to skip (see arrow::csv::ReadOptions::skip_rows) + int32_t skip_rows = 0; + /// Column names for the target table (see arrow::csv::ReadOptions::column_names) + std::vector<std::string> column_names; Review comment: I see what you're saying @bkietz, but to make the case for including it: 1. Maybe it's redundant, but is eliminating that redundancy worth adding surprises to our UI (where the usual options supported in the file reader are mostly supported in datasets, except for this one)? 2. Is it redundant? What is the other approach for renaming columns in a dataset? If it's ScannerBuilder::Project, that's arguably not renaming the dataset, just the scan results. It's also not trivial to rename columns this way if the reason why you're providing them is because there aren't any in the csv file. If it's that you can provide a Schema when you create the dataset, that means you also have to know/specify all of the column types, which you may not want to do or be able to do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org