[jira] [Commented] (ARROW-17721) [C++][Gandiva] Expression Evaluation Performance Improvement using Mimalloc

Jin Shang (Jira) Fri, 23 Sep 2022 01:05:00 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-17721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608613#comment-17608613
 ]


Jin Shang commented on ARROW-17721:
-----------------------------------

Hi Jiangtao, some updates for you.

For an expression with variable length output, such as string in your case, 
Gandiva would reallocate the output buffer to make space for results after the 
computation for each row. Mimalloc is not good at frequently reallocating large 
buffers, hence the performance you see. I create this issue: 
https://issues.apache.org/jira/browse/ARROW-17824 that may help solving the 
issue. 

In the meantime, does your workload often involves computing tens of thousands 
of rows in one batch? For smaller batch size, such as <1000, the performance of 
mimalloc is on par with jemalloc from what I observed. 

Also do check out arrow's compute engine at 
[https://arrow.apache.org/docs/cpp/compute.html] and see if it satisfies your 
need.

> [C++][Gandiva] Expression Evaluation Performance Improvement using Mimalloc
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-17721
>                 URL: https://issues.apache.org/jira/browse/ARROW-17721
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++ - Gandiva
>            Reporter: Jiangtao Peng
>            Assignee: Jin Shang
>            Priority: Major
>
> Arrow use jemalloc as default memory allocator. For some reason, I am going 
> to use mimalloc instead. But there seems have big performance difference 
> between two memory allocators.
> Here are my steps.
> I use simple compile options:
> {code:java}
> -DCMAKE_BUILD_TYPE=debug
> -DARROW_JEMALLOC=OFF|ON
> -DARROW_MIMALLOC=ON|OFF
> -DARROW_GANDIVA=ON
> -DARROW_GANDIVA_STATIC_LIBSTDCPP=ON
> -DARROW_BUILD_TESTS=ON
> {code}
>  
> Then I write a simple case:
> {code:cpp}
> #include <gtest/gtest.h>
> #include "arrow/memory_pool.h"
> #include "arrow/status.h"
> #include "gandiva/projector.h"
> #include "gandiva/tests/test_util.h"
> #include "gandiva/tree_expr_builder.h"
> #include <chrono>
> #include <iostream>
> namespace gandiva {
> using arrow::boolean;
> using arrow::date64;
> using arrow::int32;
> using arrow::int64;
> using arrow::utf8;
> class TestUtf8Perf : public ::testing::Test {
>  public:
>   void SetUp() { pool_ = arrow::default_memory_pool(); }
>  protected:
>   arrow::MemoryPool* pool_;
> };
> void TestPerf(int64_t char_length, int64_t num_records) {
>   // schema for input fields
>   auto field_a = field("a", utf8());
>   auto schema = arrow::schema({field_a});
>   // output fields
>   auto res = field("res", utf8());
>   auto node_a = TreeExprBuilder::MakeField(field_a);
>   auto upper_a = TreeExprBuilder::MakeFunction("upper", {node_a}, utf8());
>   auto expr = TreeExprBuilder::MakeExpression(upper_a, res);
>   // Build a projector for the expressions.
>   std::shared_ptr<Projector> projector;
>   auto status = Projector::Make(schema, {expr}, TestConfiguration(), 
> &projector);
>   EXPECT_TRUE(status.ok()) << status.message();
>   std::string val = std::string(char_length, 'a');
>   arrow::StringBuilder builder;
>   for (int i = 0; i < num_records; i++) {
>     auto _ = builder.Append(val);
>   }
>   std::shared_ptr<arrow::StringArray> array_a;
>   auto _ = builder.Finish(&array_a);
>   // prepare input record batch
>   auto in_batch = arrow::RecordBatch::Make(schema, num_records, {array_a});
>   auto start_epoch = std::chrono::duration_cast<std::chrono::milliseconds>(
>                          std::chrono::system_clock::now().time_since_epoch())
>                          .count();
>   // Evaluate expression
>   arrow::ArrayVector outputs;
>   status = projector->Evaluate(*in_batch, pool_, &outputs);
>   EXPECT_TRUE(status.ok()) << status.message();
>   std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(
>                    std::chrono::system_clock::now().time_since_epoch())
>                        .count() -
>                    start_epoch
>             << "ms" << std::endl;
> }
> TEST_F(TestUtf8Perf, TestMemoryAllocsPerf) {
>   TestPerf(20, 10000);
>   TestPerf(20, 100000);
>   TestPerf(200, 10000);
>   TestPerf(200, 100000);
>   TestPerf(2000, 10000);
> }
> }  // namespace gandiva
> {code}
> this case is going to calculate expression {*}upper(a){*}, *a* has different 
> size with 20/200/2000. Evaluation time results are:
> |char_length|num_records|Using Mimalloc (ms)|Using Jemalloc(ms)|
> |20|10000|29|3|
> |20|100000|2686|26|
> |200|10000|954|11|
> |200|100000|220153|118|
> |2000|10000|21162|89|
>  
> Is this performance gap expected? Or any other compile options should I note? 
> How to make performance better using mimalloc?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17721) [C++][Gandiva] Expression Evaluation Performance Improvement using Mimalloc

Reply via email to