bkietz commented on code in PR #37792:
URL: https://github.com/apache/arrow/pull/37792#discussion_r1337480232


##########
cpp/src/arrow/array/builder_binary.h:
##########
@@ -463,6 +464,238 @@ class ARROW_EXPORT LargeStringBuilder : public 
LargeBinaryBuilder {
   std::shared_ptr<DataType> type() const override { return large_utf8(); }
 };
 
+// ----------------------------------------------------------------------
+// BinaryViewBuilder, StringViewBuilder
+//
+// These builders do not support building raw pointer view arrays.
+
+namespace internal {
+
+// We allocate medium-sized memory chunks and accumulate data in those, which
+// may result in some waste if there are many large-ish strings. If a string
+// comes along that does not fit into a block, we allocate a new block and
+// write into that.
+//
+// Later we can implement optimizations to continuing filling underfull blocks
+// after encountering a large string that required allocating a new block.
+class ARROW_EXPORT StringHeapBuilder {
+ public:
+  static constexpr int64_t kDefaultBlocksize = 32 << 10;  // 32KB

Review Comment:
   1. 32KB is the default in Velox. I'm happy to change it however.
   2. It seems that setting a larger buffer size would be sufficient in most 
cases to prevent chunking. If necessary I could add 
`StringHeapBuilder::realloc_instead_of_chunking_`, but removing chunking 
altogether would hinder follow ups like adding configurable chunk count to 
random generation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to