nealrichardson commented on a change in pull request #9725:
URL: https://github.com/apache/arrow/pull/9725#discussion_r597104491



##########
File path: cpp/src/arrow/dataset/file_csv.h
##########
@@ -38,6 +38,13 @@ class ARROW_DS_EXPORT CsvFileFormat : public FileFormat {
  public:
   /// Options affecting the parsing of CSV files
   csv::ParseOptions parse_options = csv::ParseOptions::Defaults();
+  /// Number of header rows to skip (see arrow::csv::ReadOptions::skip_rows)
+  int32_t skip_rows = 0;
+  /// Column names for the target table (see 
arrow::csv::ReadOptions::column_names)
+  std::vector<std::string> column_names;

Review comment:
       I see what you're saying @bkietz, but to make the case for including it:
   
   1. Maybe it's redundant, but is eliminating that redundancy worth adding 
surprises to our UI (where the usual options supported in the file reader are 
mostly supported in datasets, except for this one)?
   2. Is it redundant? What is the other approach for renaming columns in a 
dataset? If it's ScannerBuilder::Project, that's arguably not renaming the 
dataset, just the scan results. It's also not trivial to rename columns this 
way if the reason why you're providing them is because there aren't any in the 
csv file. If it's that you can provide a Schema when you create the dataset, 
that means you also have to know/specify all of the column types, which you may 
not want to do or be able to do.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to