mapleFU commented on code in PR #47524:
URL: https://github.com/apache/arrow/pull/47524#discussion_r2330713523


##########
cpp/src/arrow/csv/writer.cc:
##########
@@ -594,18 +604,39 @@ class CSVWriterImpl : public ipc::RecordBatchWriter {
     // Only called once, as part of initialization
     RETURN_NOT_OK(data_buffer_->Resize(CalculateHeaderSize(), 
/*shrink_to_fit=*/false));
     char* next = reinterpret_cast<char*>(data_buffer_->mutable_data());
+    int64_t no_quoting_count = 0;
     for (int col = 0; col < schema_->num_fields(); ++col) {
-      *next++ = '"';
-      next = Escape(schema_->field(col)->name(), next);
-      *next++ = '"';
+      const std::string& col_name = schema_->field(col)->name();
+      switch (options_.quoting_header) {
+        case QuotingStyle::None:
+          if (StopAtStructuralChar(reinterpret_cast<const 
uint8_t*>(col_name.c_str()),
+                                   col_name.length(), options_.delimiter) ==
+              static_cast<int64_t>(col_name.length())) {
+            memcpy(next, col_name.data(), col_name.size());
+            next += col_name.size();
+            no_quoting_count += 1;
+            break;
+          }
+          [[fallthrough]];
+        default:

Review Comment:
   Can we add missing two branches without default?



##########
cpp/src/arrow/csv/writer.cc:
##########
@@ -594,18 +604,39 @@ class CSVWriterImpl : public ipc::RecordBatchWriter {
     // Only called once, as part of initialization
     RETURN_NOT_OK(data_buffer_->Resize(CalculateHeaderSize(), 
/*shrink_to_fit=*/false));

Review Comment:
   Actually I don't know would calculateHeaderSize can bettering handle this 
without a round of resize. But current also looks ok to me



##########
cpp/src/arrow/csv/options.h:
##########
@@ -188,6 +188,12 @@ struct ARROW_EXPORT WriteOptions {
   /// Whether to write an initial header line with column names
   bool include_header = true;
 
+  /// \brief Quoting style of header
+  ///
+  /// If `quoting_header` is `QuotingStyle::None`, then only write quotes when 
a column
+  /// name contains structural characters. Otherwise, always quote column 
names.
+  QuotingStyle quoting_header = QuotingStyle::AllValid;

Review Comment:
   I see, this is because `QuotingStyle::Needed` is too relaxed now, which 
makes every string generate a quote value ( Might for performance ). Personally 
I think we can:
   1. Return error when None is configed and has error
   2. If Needed , follow None behavior now
   3. Default AllValid, add quotes everywhere
   
   Also cc @pitrou for idea



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to