[
https://issues.apache.org/jira/browse/ARROW-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420100#comment-16420100
]
ASF GitHub Bot commented on ARROW-2351:
---------------------------------------
gaolizhou commented on a change in pull request #1803: ARROW-2351 [C++]
StringBuilder::append(vector<string>...) not impleme…
URL: https://github.com/apache/arrow/pull/1803#discussion_r178223472
##########
File path: cpp/src/arrow/builder.cc
##########
@@ -1385,6 +1385,28 @@ const uint8_t* BinaryBuilder::GetValue(int64_t i,
int32_t* out_length) const {
StringBuilder::StringBuilder(MemoryPool* pool) : BinaryBuilder(utf8(), pool) {}
+Status StringBuilder::Append(const std::vector<std::string>& values,
+ uint8_t* null_bytes) {
+ std::size_t total_length = std::accumulate(
+ values.begin(), values.end(), 0ULL,
+ [](uint64_t sum, const std::string& str) { return sum + str.size(); });
Review comment:
Tested with below code, for-loop-solution and accumulate-solution almost has
same performance. To count 100000000 strings,
log1: for-loop-solution takes 212.312 ms, accumulate-solution takes 215.120
ms.
log2: for-loop-solution takes 214.900 ms, accumulate-solution takes 208.522
ms.
log1:
```
lizgao@lizgao-ubuntu:~/CLionProjects/cmake-test$ ./cmake_test
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0330 10:29:56.864248 3989 accmulate-lambda-performance.cc:15] Begin
for_loop_solution
I0330 10:29:57.076560 3989 accmulate-lambda-performance.cc:19] End
for_loop_solution = 8000000000
I0330 10:29:57.076603 3989 accmulate-lambda-performance.cc:23] Begin
accumulate_solution
I0330 10:29:57.291723 3989 accmulate-lambda-performance.cc:27] End
accumulate_solution = 8000000000
```
log2:
```
lizgao@lizgao-ubuntu:~/CLionProjects/cmake-test$ ./cmake_test
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0330 10:41:53.236687 4589 accmulate-lambda-performance.cc:15] Begin
for_loop_solution
I0330 10:41:53.451587 4589 accmulate-lambda-performance.cc:19] End
for_loop_solution = 8000000000
I0330 10:41:53.451625 4589 accmulate-lambda-performance.cc:23] Begin
accumulate_solution
I0330 10:41:53.660147 4589 accmulate-lambda-performance.cc:27] End
accumulate_solution = 8000000000
```
```
//
// Created by lizgao on 3/30/18.
//
#include <cstdint>
#include <numeric>
#include <string>
#include <vector>
#include <glog/logging.h>
namespace {
void for_loop_solution(const std::vector<std::string> &values) {
std::size_t total_length = 0;
std::size_t cnt = values.size();
LOG(INFO) << "Begin for_loop_solution";
for (std::size_t i = 0; i < cnt; ++i) {
total_length += values[i].size();
}
LOG(INFO) << "End for_loop_solution = " << total_length;
}
void accumulate_solution(const std::vector<std::string> &values) {
LOG(INFO) << "Begin accumulate_solution";
std::size_t total_length = std::accumulate(
values.begin(), values.end(), 0ULL,
[](uint64_t sum, const std::string& str) { return sum + str.size();
});
LOG(INFO) << "End accumulate_solution = " << total_length;
}
}
void accumulate_lambda_performance_test() {
std::vector<std::string> str_list(100000000, std::string(80, '*'));
for_loop_solution(str_list);
accumulate_solution(str_list);
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [C++] StringBuilder::append(vector<string>...) not implemented
> --------------------------------------------------------------
>
> Key: ARROW-2351
> URL: https://issues.apache.org/jira/browse/ARROW-2351
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 0.9.0
> Reporter: Rares Vernica
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.10.0
>
>
> For {{StringBuilder}} an {{append(vector<string>, uint8_t*)}} function is
> [declared|https://github.com/apache/arrow/blob/7b2c79765cf92760e1f8cca079159d9613b86412/cpp/src/arrow/builder.h#L721]
> and
> [documented|http://arrow.apache.org/docs/cpp/classarrow_1_1_string_builder.html#a59be34b5e11017a392b4ee019d90da3c]
> but it does not seem to be implemented.
> {code:java}
> undefined reference to `arrow::StringBuilder::Append(std::vector<std::string,
> std::allocator<std::string> > const&, unsigned char*)'
> collect2: error: ld returned 1 exit status
> {code}
> Also worth noting is that the similar function in {{NumericBuilder}} uses
> {{vector<bool>}} for the null values instead of {{uint8_t*}}. It might be
> worth making them consistent.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)