lidavidm commented on a change in pull request #10230:
URL: https://github.com/apache/arrow/pull/10230#discussion_r631239611
##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -1819,6 +1824,28 @@ cdef class CsvFragmentScanOptions(FragmentScanOptions):
self.read_options)
+cdef class CsvFileWriteOptions(FileWriteOptions):
Review comment:
Unfortunately in the context of datasets (and only datasets) all other
classes already use Csv.
##########
File path: cpp/src/arrow/dataset/file_csv.h
##########
@@ -83,6 +82,35 @@ struct ARROW_DS_EXPORT CsvFragmentScanOptions : public
FragmentScanOptions {
csv::ReadOptions read_options = csv::ReadOptions::Defaults();
};
+class ARROW_DS_EXPORT CsvFileWriteOptions : public FileWriteOptions {
+ public:
+ /// Options passed to csv::MakeCSVWriter. use_threads is ignored
Review comment:
I copied this from the equivalent IPC struct - it doesn't apply here
since there's no such parameter of course.
##########
File path: cpp/src/arrow/csv/writer.cc
##########
@@ -403,34 +415,44 @@ class CSVConverter {
}
static constexpr int64_t kColumnSizeGuess = 8;
+ io::OutputStream* sink_;
+ std::shared_ptr<io::OutputStream> owned_sink_;
Review comment:
I agree it seems weird, but both the IPC and Parquet writers use
shared_ptr for this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]