[GitHub] [druid] paul-rogers commented on pull request #12745: Frame format for data transfer and short-term storage.

GitBox Wed, 06 Jul 2022 16:14:40 -0700


paul-rogers commented on PR #12745:
URL: https://github.com/apache/druid/pull/12745#issuecomment-1176845659


   @gianm, The Drill discussion I mentioned was spread across a number of 
venues, and was mixed in with the perennial "should Drill use Arrow" 
discussion. Some "lessons learned" appear in the discussion of [this 
PR](https://github.com/apache/drill/pull/2412). The discussion moved to [this 
issue](https://github.com/apache/drill/issues/2421). There were no conclusions, 
just varying points of view; observations about reality, and the continued 
aspirations of the magical powers of columnar formats.
   
   Druid is different (it is worth repeating), so the specific issues don't 
apply. The two key points that are relevant:
   
   * Columnar is said to be faster, especially with non-null data, stored in 
direct memory and with operations that can be compiled down to SIMD 
instructions, such as simple aggregations. Though, in practice, SQL uses nulls, 
our tools use Java, and only Gandiva has provided a SIMD implementation.
   * Columnar can lead to very poor performance during shuffles (exchanges) at 
scale because of the buffering issues mentioned above.
   
   There is nothing from that discussion that would cause this PR to change. 
Just some "school of hard knocks" learning that we want to avoid repeating as 
we move ahead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on pull request #12745: Frame format for data transfer and short-term storage.

Reply via email to