[
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163524#comment-16163524
]
Paul Rogers commented on DRILL-5783:
------------------------------------
Very good idea. A crude attempt to do something similar was done in the
external sort work. (See the "managed" version of the {{ExternalSortBatch}}.)
In the sort, we created a "wrapper" that generates the code and performs the
operation. The sort has multiple chunks of generated code, so this was a useful
approach.
Ideally, we'd leverage this work to implement better code caching. Today, we
use the source code as the key to the code cache. Since the source can be large
(100s of K) and we need two copies (to erase unique class names), the memory
needs add up.
Better would be to define a class that contains all parameters needed to
generate the code. Objects of that class, which should be much smaller than the
code itself, would be the key to check for an existing matching definition.
Regardless of the refactoring, consider reviewing the "managed" external sort
tests for some useful tools: a {{RowSet}} abstraction that lets you quickly and
easily set up input batches and verify output batches, a "sub-operator" test
framework, and examples for how to test code generation separately from the
full Drill server (that is, examples for how to break the tight coupling that
one normally has to deal with.)
> Make code generation in the TopN operator more modular and test it
> ------------------------------------------------------------------
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: Timothy Farkas
> Assignee: Timothy Farkas
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)