Paul Rogers created DRILL-5779:
----------------------------------
Summary: HashAgg template is far too large, cause performance hit
Key: DRILL-5779
URL: https://issues.apache.org/jira/browse/DRILL-5779
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Paul Rogers
Drill uses code generation to produce query-specific code to copy values,
perform calculations, and so on. Drill does this by generating code based on
templates. Drill, internally, copies the template byte codes and merges them
with generated by byte codes. (Drill does not use Java subclassing for
generated code.)
The Hash Agg batch places thousands of lines of boilerplate code into the
template. This forces Drill to:
1. Copy those byte codes *for every query*.
2. The "byte code fixup" logic to walk the byte code tree for the template *for
every query.*
3. The code cache to cache a separate copy of the template *for every query*.
There is a clear performance cost from doing the copying and tree walking.
There is a memory cost to buffering multiple copies of the same code. It is not
clear that we have any data that says that doing this work provides benefits to
the Drill user in terms of better stability, greater performance or more
features.
We should consider moving the bulk of the code out of the template to avoid the
overheads cited above. The result may be better performance and reduced memory
pressure.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)