westonpace commented on a change in pull request #10202:
URL: https://github.com/apache/arrow/pull/10202#discussion_r627653607
##########
File path: cpp/src/arrow/csv/parser.cc
##########
@@ -76,9 +76,45 @@ class PresizedDataWriter {
parsed_[parsed_size_++] = static_cast<uint8_t>(c);
}
+ // Push the value of a fully complete field. This should only be used to
fill in missing
+ // values. This method can reallocate the buffer if there isn't enough extra
space for
+ // the field.
+ Status PushField(const std::string& field) {
+ if (field.length() > extra_allocated_) {
+ // just in case this happens more allocate enough for 10x this amount
+ auto to_allocate = static_cast<uint32_t>(
+ std::max(field.length() * 10,
static_cast<std::string::size_type>(128)));
Review comment:
> My original intent was to write a way to handle rows with incorrect
number of columns and not add nulls or truncate the rows but instead record
them in a custom handler.
That's fine, you're welcome to whatever intent :). Can you create a JIRA or
add a comment to the other JIRA describing your needs? That will help others
in evaluating the feature.
> With that being said I would strongly like to be able to keep the custom
handlers in the API.
Arrow doesn't do a lot of "calling out" today but that might just be
happenstance. I'll let others more knowledgeable than me chime in on the
subject.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]