[ 
https://issues.apache.org/jira/browse/DRILL-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903474#comment-15903474
 ] 

Paul Rogers commented on DRILL-5342:
------------------------------------

Code movements include:

* Move the bulk of the external sort implementation from {{ExternalSortBatch}} 
(ESB) to a new {{SortImpl}} class. The ESB becomes primarily an implementation 
of Drill's Volcano-inspired "next" protocol. {{SortImpl}} does the actual sort 
work. {{SortImpl}} has a narrow, sub-operator interface that does not depend on 
global state.

* Move memory management code to a new {{SortMemoryManager}} class to allow 
direct testing of that code. This code was heavily unit tested with the two 
memory algorithms (per-batch memory setup and consolidation spill decisions) 
undergoing significant simplification.

* The spill logic was split into two parts. Previously, the code would check a 
memory condition, then directly spill as a result of that decision. Now, the 
logic itself resides in the {{SortMemoryManager}}, takes the required 
parameters, and returns a consolidation decision: spill, merge or do nothing. 
The result is much easier to test. (Resulting in many code simplifications.)

* Move configuration into a new {{SortConfig}} class to allow direct unit 
testing.

* The prior version had taken a first crack at pulling the details of code 
generation into a separate class: {{OperatorCodeGen}}. This pass breaks the 
prior class into "wrappers" (need a better name) that encapsulate each 
algorithm for which we generate code. The wrapper holds the code generation 
logic (previously in {{OperatorCodeGen}} and the code that does the work to use 
the generated code (previously in various classes.)

* Some rationalization was done to the {{Sort}}, {{ExternalSort}} and {{Order}} 
operator definitions to move constants from inside code to declared constants 
and so on to allow unit testing.

* Similar adjustments were done to the operator definition base class: 
{{AbstractBase}}.

* Statistics-related code moved to a new {{SortMetrics}} class to allow testing 
of metrics gathering separate from the rest of the code. The metrics themselves 
did not change, only how they are packaged.

> Refactor "managed" external sort for unit tests
> -----------------------------------------------
>
>                 Key: DRILL-5342
>                 URL: https://issues.apache.org/jira/browse/DRILL-5342
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Tools, Build & Test
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.11.0
>
>
> The external sort was heavily refactored in 1.10 to allow adding memory 
> management. (See DRILL-5080.) That effort focused on breaking up the large 
> functions into smaller chunks to allow us to more easily modify the bits 
> relevant to adding memory management.
> This ticket discusses refining those changes to create modular bits that can 
> be unit tested individually. For the most part, code will remain unchanged 
> functionally, but functions will shift from class to class to break 
> dependencies. Also, interfaces will be narrowed to remove global dependencies 
> on things like {{FragmentContext}} or {{OperatorContext}}.
> The code reviewer(s), when using GitHub, will simply see large changes. Notes 
> below identify what was actually changed vs. what was merely moved.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to