RemHero commented on issue #39365:
URL: https://github.com/apache/arrow/issues/39365#issuecomment-1871199198
Thank you very much for your suggestions! I conducted some further
experiments based on your advice and found that the `batchSize` has a
significant impact on the computation of expressions.
Additionally, I implemented the computation using
`arrow::compute::Expression`. However, the performance still falls short
compared to the row-based implementation. Regarding the 'memory-boundary' you
mentioned, I understand that my calculation process involves a lot of memory
allocation and copying due to storing intermediate results. To address this, I
tested only a multiplication operation, eliminating unnecessary intermediate
result copies, and the results showed that the performance difference between
the two approaches was not significant.
Base on this, I'm not certain whether the arrow-based Function computation
utilizes SIMD optimization? It might require further investigation through
disassembly. Going forward, I plan to continue experimenting with the
`Array-wise ('vector') functions `mentioned in the official documentation and
follow your advice to hand-code the computation expressions above
arrow-columanr.
I have referred to the documentation you provided, and I may also try
setting the appropriate SIMD optimization level during compilation.
There might be a result within the next couple of days. @mapleFU
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]