[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9725: ARROW-8631: [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python

GitBox Thu, 18 Mar 2021 11:46:02 -0700


jorisvandenbossche commented on a change in pull request #9725:
URL: https://github.com/apache/arrow/pull/9725#discussion_r597149363




##########
File path: cpp/src/arrow/dataset/file_csv.h
##########
@@ -38,6 +38,13 @@ class ARROW_DS_EXPORT CsvFileFormat : public FileFormat {
  public:
   /// Options affecting the parsing of CSV files
   csv::ParseOptions parse_options = csv::ParseOptions::Defaults();
+  /// Number of header rows to skip (see arrow::csv::ReadOptions::skip_rows)
+  int32_t skip_rows = 0;
+  /// Column names for the target table (see 
arrow::csv::ReadOptions::column_names)
+  std::vector<std::string> column_names;

Review comment:
       AFAIK, `column_names` is not meant to replace existing columns (and thus 
do an implicit renaming), it's to specify column names if they are not present 
in the file. 
   
   That way, I think it makes a lot of sense to include it here. If you have a 
set of regular CSV files (all having the same number of columns, in the same 
order, etc) that don't have names embedded, you would otherwise not be able to 
read those using the Datasets API.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9725: ARROW-8631: [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python

Reply via email to