houqp opened a new pull request #1556:
URL: https://github.com/apache/arrow-datafusion/pull/1556


   # Which issue does this PR close?
   
   Close arrow2 milestone https://github.com/apache/arrow-datafusion/milestone/3
   
    # Rationale for this change
   
   Provide a complete arrow2 based datafusion implementation for full 
evaluation of the migration. This should give us a good feeling of the arrow2 
API UX as well as a starting point for performance benchmarks within datafusion 
and downstream projects.
   
   The goal is to merge the code into an official arrow2 branch in the short 
run, until we are comfortable doing the switch in master.
   
   # What changes are included in this PR?
   
   * Switched to arrow2
   * Enabled miri test
   
   Here is a TPCH benchmark I ran on my Linux laptop:
   
   
![Screenshot_20220113_174918](https://user-images.githubusercontent.com/670302/149437604-27fca7a1-55e4-48fc-a02a-bc9ee7d2ed74.png)
   
   On avg, we are getting around 5% speed up across the board, with q5 at 11% 
gain and q12 at only 1%. If this performance gain can also be replicated in 
downstream projects, then I think it would be a strong case for us to do the 
arrow2 swtich.
   
   # Are there any user-facing changes?
   
   Yes, downstream consumer of datafusion will need to switch to arrow2 as well.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to