[
https://issues.apache.org/jira/browse/DRILL-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468282#comment-17468282
]
ASF GitHub Bot commented on DRILL-8088:
---------------------------------------
paul-rogers commented on pull request #2412:
URL: https://github.com/apache/drill/pull/2412#issuecomment-1004444610
One last note. Let's assume we wanted to adopt the row-based format (or, the
myths being strong, we want to adopt Arrow.) How would we go about it?
The "brute force" approach is to rewrite all the operators. Must deal with
low-level vector code, so we'd rewrite that with low-level row (or Arrow) code.
Since we can't really test until all operators are converted, we'd have to do
the entire conversion in one huge effort. Then, we get to debug. I hope this
approach is setting off alarm bells: it is high cost and high risk. This is why
Drill never seriously entertained the change.
But, there is another solution. The scan readers all used to work directly
with vectors. (Parquet still does.) Because of the memory reasons explained
above, we converted most of them to use EVF. As a result, we could swap vectors
for row pages (or Arrow) by changing the low-level code. Readers would be
blissfully ignorant of such changes because the higher-level abstractions would
be unchanged.
So, a more sane way to approach a change of in-memory representations is to
first convert the other operators to use an EVF-like approach. (EVF for writing
new batches, a "Result Set Loader" for reading exiting batches.) Such a change
can be done gradually, operator-by-operator, and is fully compatible with
other, non-converted operators. No big bang.
Once everything is upgraded to EVF, then we can swap out the in-memory
format. Maybe try Arrow. Try a row-based format. Run tests. Pick the winner.
This is *not* a trivial exercise, but it is doable over time, if we see
value and can muster the resources.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Improve expression evaluation performance
> -----------------------------------------
>
> Key: DRILL-8088
> URL: https://issues.apache.org/jira/browse/DRILL-8088
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Codegen
> Reporter: wtf
> Assignee: wtf
> Priority: Minor
>
> Found unnecessary map copy when doing expression evaluation, it will slow
> down the codegen when the query include many "case when" or avg/stddev(the
> reduced expressions include "case when"). In our case, the query include 314
> avg, it takes 3+ seconds to generate the projector expressions(Intel(R)
> Xeon(R) CPU E5-2682 v4 @ 2.50GHz 32cores).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)