cyb70289 commented on a change in pull request #12504:
URL: https://github.com/apache/arrow/pull/12504#discussion_r816400744
##########
File path: cpp/src/arrow/csv/writer.cc
##########
@@ -328,6 +351,26 @@ class QuotedColumnPopulator : public ColumnPopulator {
}
private:
+ // Returns true if there's no quote in the string array
+ // similar to std::find, but with much better performance
+ static bool NoQuoteInArray(const StringArray& array) {
+ const uint8_t* data = array.raw_data() + array.value_offset(0);
+ const int64_t buffer_size = array.total_values_length();
+ const uint8_t* const data_end = data + buffer_size;
+
+ for (int64_t i = 0; i < buffer_size / 16; ++i) {
+ bool r = false;
+ for (int i = 0; i < 16; ++i) {
+ r |= data[i] == '"';
+ }
Review comment:
`memchr` is great per my test. Will use it. Don't know why `std::find`
doesn't vectorize the code.
`CountQuotes` hurts performance a bit (about 30% drop for StringNoQuote/0).
Guess it does more work than necessary.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]