[GitHub] [arrow] pitrou commented on a change in pull request #9725: ARROW-8631: [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python

GitBox Mon, 22 Mar 2021 08:35:13 -0700


pitrou commented on a change in pull request #9725:
URL: https://github.com/apache/arrow/pull/9725#discussion_r598829828




##########
File path: cpp/src/arrow/dataset/file_csv.h
##########
@@ -33,11 +33,35 @@ namespace dataset {
 
 constexpr char kCsvTypeName[] = "csv";
 
+/// \brief Per-scan options for CSV fragments
+struct ARROW_DS_EXPORT CsvFragmentScanOptions : public FragmentScanOptions {
+  std::string type_name() const override { return kCsvTypeName; }
+
+  /// CSV conversion options
+  csv::ConvertOptions convert_options = csv::ConvertOptions::Defaults();
+
+  /// Block size for reading (see arrow::csv::ReadOptions::block_size)
+  int32_t block_size = 1 << 20;  // 1 MB
+};
+
 /// \brief A FileFormat implementation that reads from and writes to Csv files
 class ARROW_DS_EXPORT CsvFileFormat : public FileFormat {
  public:
-  /// Options affecting the parsing of CSV files
-  csv::ParseOptions parse_options = csv::ParseOptions::Defaults();
+  /// \brief Dataset-wide options for CSV. For convenience (both for users, and
+  /// for authors of language bindings), these include fields for per-scan
+  /// options, however, if per-scan options are provided, then those fields 
will
+  /// be overridden.
+  struct ReaderOptions : CsvFragmentScanOptions {

Review comment:
       That would be reasonable if the ignored fields had semantic 
implications. But here it really feels clumsy to have top-level options and an 
embedded ParseOptions struct inside.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] pitrou commented on a change in pull request #9725: ARROW-8631: [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python

Reply via email to