Re: [I] I want to use arrow to recode some projects, but when I use arrow to read csv and compute some indicator, the speed of arrow c++ is even lower than python code, is there something wrong? [arrow]

via GitHub Mon, 23 Oct 2023 03:27:45 -0700


tianjixuetu commented on issue #38370:
URL: https://github.com/apache/arrow/issues/38370#issuecomment-1774892281


   > Call Compute function using scalar is like volcano model in data base, it 
has the cost of:
   > 
   > 1. Find the function ( in dispatch )
   > 2. Detect the input type
   > 3. Compute -> this is the only logic we actually need
   > 4. Wrap the output function
   > 
   > The pure C++ code is a bit like the `codegen` in system. You already know 
the type(though reading from file might suffer from non-optimal performance). 
So computing using raw-C++ with self defined type would be faster. You can 
achive some similar performance using some template to compute the logic 
directly.
   > 
   > So I don't think it's a good way if you can ensure the function call and 
know the input / output type. Also when I run benchmark localy, the performance 
mainly slower when:
   > 
   > 1. Setup the framework.
   > 2. Dispatch function
   > 
   > So you may need to just benchmark the "compute time", rather than this. 
The initialize of arrow::compute might take some time.
   > 
   > Specificlly, you can:
   > 
   > ```c++
   >   auto registry = ::arrow::compute::GetFunctionRegistry();
   >   // 计算收益率
   >   auto start_time = std::chrono::high_resolution_clock::now();
   > ```
   > 
   > > by the way, is this code generated by chatgpt?
   > 
   > No.
   
   Your points make a lot of sense, but when it comes to Arrow as a standalone 
module providing computation capabilities, especially in C++, the performance 
is unexpectedly slower than Python code. This is somewhat unacceptable, and 
there is a significant need for improvement. Are you familiar with Arrow? Do 
you have specific methods to implement data reading and computation to achieve 
speeds close to C++?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] I want to use arrow to recode some projects, but when I use arrow to read csv and compute some indicator, the speed of arrow c++ is even lower than python code, is there something wrong? [arrow]

Reply via email to