[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163524#comment-16163524
 ] 

Paul Rogers commented on DRILL-5783:
------------------------------------

Very good idea. A crude attempt to do something similar was done in the 
external sort work. (See the "managed" version of the {{ExternalSortBatch}}.)

In the sort, we created a "wrapper" that generates the code and performs the 
operation. The sort has multiple chunks of generated code, so this was a useful 
approach.

Ideally, we'd leverage this work to implement better code caching. Today, we 
use the source code as the key to the code cache. Since the source can be large 
(100s of K) and we need two copies (to erase unique class names), the memory 
needs add up.

Better would be to define a class that contains all parameters needed to 
generate the code. Objects of that class, which should be much smaller than the 
code itself, would be the key to check for an existing matching definition.

Regardless of the refactoring, consider reviewing the "managed" external sort 
tests for some useful tools: a {{RowSet}} abstraction that lets you quickly and 
easily set up input batches and verify output batches, a "sub-operator" test 
framework, and examples for how to test code generation separately from the 
full Drill server (that is, examples for how to break the tight coupling that 
one normally has to deal with.)

> Make code generation in the TopN operator more modular and test it
> ------------------------------------------------------------------
>
>                 Key: DRILL-5783
>                 URL: https://issues.apache.org/jira/browse/DRILL-5783
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to