paul-rogers commented on PR #12745: URL: https://github.com/apache/druid/pull/12745#issuecomment-1176845659
@gianm, The Drill discussion I mentioned was spread across a number of venues, and was mixed in with the perennial "should Drill use Arrow" discussion. Some "lessons learned" appear in the discussion of [this PR](https://github.com/apache/drill/pull/2412). The discussion moved to [this issue](https://github.com/apache/drill/issues/2421). There were no conclusions, just varying points of view; observations about reality, and the continued aspirations of the magical powers of columnar formats. Druid is different (it is worth repeating), so the specific issues don't apply. The two key points that are relevant: * Columnar is said to be faster, especially with non-null data, stored in direct memory and with operations that can be compiled down to SIMD instructions, such as simple aggregations. Though, in practice, SQL uses nulls, our tools use Java, and only Gandiva has provided a SIMD implementation. * Columnar can lead to very poor performance during shuffles (exchanges) at scale because of the buffering issues mentioned above. There is nothing from that discussion that would cause this PR to change. Just some "school of hard knocks" learning that we want to avoid repeating as we move ahead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
