[GitHub] [arrow] lidavidm commented on a change in pull request #10230: ARROW-12512: [C++][Python][Dataset] Create CSV writer class and add Datasets support

GitBox Wed, 12 May 2021 10:15:05 -0700


lidavidm commented on a change in pull request #10230:
URL: https://github.com/apache/arrow/pull/10230#discussion_r631239611




##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -1819,6 +1824,28 @@ cdef class CsvFragmentScanOptions(FragmentScanOptions):
                                         self.read_options)
 
 
+cdef class CsvFileWriteOptions(FileWriteOptions):

Review comment:
       Unfortunately in the context of datasets (and only datasets) all other 
classes already use Csv.

##########
File path: cpp/src/arrow/dataset/file_csv.h
##########
@@ -83,6 +82,35 @@ struct ARROW_DS_EXPORT CsvFragmentScanOptions : public 
FragmentScanOptions {
   csv::ReadOptions read_options = csv::ReadOptions::Defaults();
 };
 
+class ARROW_DS_EXPORT CsvFileWriteOptions : public FileWriteOptions {
+ public:
+  /// Options passed to csv::MakeCSVWriter. use_threads is ignored

Review comment:
       I copied this from the equivalent IPC struct - it doesn't apply here 
since there's no such parameter of course.

##########
File path: cpp/src/arrow/csv/writer.cc
##########
@@ -403,34 +415,44 @@ class CSVConverter {
   }
 
   static constexpr int64_t kColumnSizeGuess = 8;
+  io::OutputStream* sink_;
+  std::shared_ptr<io::OutputStream> owned_sink_;

Review comment:
       I agree it seems weird, but both the IPC and Parquet writers use 
shared_ptr for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on a change in pull request #10230: ARROW-12512: [C++][Python][Dataset] Create CSV writer class and add Datasets support

Reply via email to