[
https://issues.apache.org/jira/browse/ARROW-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332781#comment-16332781
]
ASF GitHub Bot commented on ARROW-1712:
---------------------------------------
xuepanchen commented on a change in pull request #1481: ARROW-1712: [C++] Add
method to BinaryBuilder to reserve space for value data
URL: https://github.com/apache/arrow/pull/1481#discussion_r162709422
##########
File path: cpp/src/arrow/array-test.cc
##########
@@ -1154,6 +1154,54 @@ TEST_F(TestBinaryBuilder, TestScalarAppend) {
}
}
}
+
+TEST_F(TestBinaryBuilder, TestCapacityReserve) {
+ vector<string> strings = {"a", "bb", "cc", "ddddd", "eeeee"};
+ int64_t N = static_cast<int>(strings.size());
+ int64_t length = 0;
+ int64_t data_length = 0;
+ int64_t capacity = N;
+
+ ASSERT_OK(builder_->Reserve(capacity));
+ ASSERT_OK(builder_->ReserveData(capacity));
+
+ ASSERT_EQ(builder_->length(), length);
+ ASSERT_EQ(builder_->capacity(), BitUtil::NextPower2(capacity));
+ ASSERT_EQ(builder_->value_data_length(), data_length);
+ ASSERT_EQ(builder_->value_data_capacity(), capacity);
+
+ for(const string& str : strings) {
+ ASSERT_OK(builder_->Append(str));
+ length++;
+ data_length += static_cast<int>(str.size());
+
+ ASSERT_EQ(builder_->length(), length);
+ ASSERT_EQ(builder_->capacity(), BitUtil::NextPower2(capacity));
+ ASSERT_EQ(builder_->value_data_length(), data_length);
+ if (data_length <= capacity) {
+ ASSERT_EQ(builder_->value_data_capacity(), capacity);
+ } else {
+ ASSERT_EQ(builder_->value_data_capacity(), data_length);
Review comment:
@wesm value_data_capacity() is actually always a multiple of 64 greater than
or equal to the amount of data appended so far because the underlying buffer
size is set to ensure that the capacity of the buffer is a multiple of 64 bytes
as defined in Layout.md, i.e.
ASSERT_EQ(BitUtil::RoundUpToMultipleOf64(data_length),
builder_->value_data_capacity())
So if you call ReserveData(capacity) at the very beginning, then we have
ASSERT_EQ(BitUtil::RoundUpToMultipleOf64(capacity),
builder_->value_data_capacity())
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [C++] Add method to BinaryBuilder to reserve space for value data
> -----------------------------------------------------------------
>
> Key: ARROW-1712
> URL: https://issues.apache.org/jira/browse/ARROW-1712
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Wes McKinney
> Assignee: Panchen Xue
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.9.0
>
>
> The {{Resize}} and {{Reserve}} methods only reserve space for the value
> offsets. When building binary/string arrays with a known size (or some
> reasonable estimate), it would be more efficient to reserve once at the
> beginning to prevent internal reallocations
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)