westonpace commented on a change in pull request #10202:
URL: https://github.com/apache/arrow/pull/10202#discussion_r627653607



##########
File path: cpp/src/arrow/csv/parser.cc
##########
@@ -76,9 +76,45 @@ class PresizedDataWriter {
     parsed_[parsed_size_++] = static_cast<uint8_t>(c);
   }
 
+  // Push the value of a fully complete field. This should only be used to 
fill in missing
+  // values. This method can reallocate the buffer if there isn't enough extra 
space for
+  // the field.
+  Status PushField(const std::string& field) {
+    if (field.length() > extra_allocated_) {
+      // just in case this happens more allocate enough for 10x this amount
+      auto to_allocate = static_cast<uint32_t>(
+          std::max(field.length() * 10, 
static_cast<std::string::size_type>(128)));

Review comment:
       > My original intent was to write a way to handle rows with incorrect 
number of columns and not add nulls or truncate the rows but instead record 
them in a custom handler.
   
   That's fine, you're welcome to whatever intent :).  Can you create a JIRA or 
add a comment to the other JIRA describing your needs?  That will help others 
in evaluating the feature.
   
   > With that being said I would strongly like to be able to keep the custom 
handlers in the API.
   
   Arrow doesn't do a lot of "calling out" today but that might just be 
happenstance.  I'll let others more knowledgeable than me chime in on the 
subject.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to