alamb commented on code in PR #8960:
URL: https://github.com/apache/arrow-rs/pull/8960#discussion_r2612189917


##########
arrow-csv/src/writer.rs:
##########
@@ -157,7 +163,22 @@ impl<W: Write> Writer<W> {
                         col_idx + 1
                     ))
                 })?;
-                byte_record.push_field(buffer.as_bytes());
+
+                // Apply whitespace trimming if options are enabled and the 
column is a string type
+                let field_bytes = if should_trim && 
batch.column(col_idx).data_type() == &DataType::Utf8 {

Review Comment:
   There are other string types as well, such as LargeUtf8 and Utf8View. Could 
you please  add support (and a test) for them as well?



##########
arrow-csv/examples/whitespace_handling.rs:
##########
@@ -0,0 +1,91 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_array::*;
+use arrow_csv::WriterBuilder;
+use arrow_schema::*;
+use std::sync::Arc;
+
+fn main() {

Review Comment:
   thank you for this example. However, I think this might be hard to find
   
   Would you be willing to move this example into a doc example in  
https://github.com/apache/arrow-rs/blob/7c6a883302551dde7e89bfed1779c74dac677a0a/arrow-csv/src/writer.rs#L20-L19
   
   Perhaps as a way to show how to use options



##########
arrow-csv/src/writer.rs:
##########
@@ -389,6 +416,28 @@ impl WriterBuilder {
         self.null_value.as_deref().unwrap_or(DEFAULT_NULL_VALUE)
     }
 
+    /// Set whether to ignore leading whitespace in string values
+    pub fn with_ignore_leading_whitespace(mut self, ignore: bool) -> Self {
+        self.ignore_leading_whitespace = ignore;
+        self
+    }
+
+    /// Get whether to ignore leading whitespace in string values
+    pub fn ignore_leading_whitespace(&self) -> bool {
+        self.ignore_leading_whitespace
+    }
+
+    /// Set whether to ignore trailing whitespace in string values

Review Comment:
   Similarly to above, can we add an example? 
   
   ```rust
   /// 
   /// For example, a string values such as "foo. " will be written as "foo"
   ```



##########
arrow-csv/src/writer.rs:
##########
@@ -389,6 +416,28 @@ impl WriterBuilder {
         self.null_value.as_deref().unwrap_or(DEFAULT_NULL_VALUE)
     }
 
+    /// Set whether to ignore leading whitespace in string values

Review Comment:
   I would have personally suggested using "strip leading whitespace" but I see 
this is consistent with Spark
   
   Perhaps you could add an example to this documentation like
   
   ```rust
   /// 
   /// For example, a string values such as "  foo" will be written as "foo"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to