[jira] [Commented] (DRILL-6030) Managed sort should minimize number of batches in a k-way merge

ASF GitHub Bot (JIRA) Wed, 20 Dec 2017 14:15:18 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299174#comment-16299174
 ]


ASF GitHub Bot commented on DRILL-6030:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1075#discussion_r158150320
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortConfig.java
 ---
    @@ -84,7 +85,7 @@ public SortConfig(DrillConfig config) {
         if (limit > 0) {
           mergeLimit = Math.max(limit, MIN_MERGE_LIMIT);
         } else {
    -      mergeLimit = Integer.MAX_VALUE;
    +      mergeLimit = DEFAULT_MERGE_LIMIT;
    --- End diff --
    
    There may be a misunderstanding of how config options work. We define the 
defaults in Drill's own source code: `drill-module.conf` in each module. (Here 
it is in `java-exec`.)
    
    To change the default option, we change the value in `drill-module.conf`. 
In the highly unlikely case that a user has overridden this value in 
`drill-override.conf`, their value will be used. But, the option is not 
documented in `drill-override-example.conf` so it is very, very unlikely that 
anyone created an override. (The property is meant to be internal, for use in 
tests.)
    
    So, rather than introducing yet another variable, we might as well use the 
existing config property. This has the added advantage that, if experience 
suggests that we need a smaller or larger limit for some scenarios, we can make 
the adjustment in the field via the config system.


> Managed sort should minimize number of batches in a k-way merge
> ---------------------------------------------------------------
>
>                 Key: DRILL-6030
>                 URL: https://issues.apache.org/jira/browse/DRILL-6030
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Vlad Rozov
>            Assignee: Vlad Rozov
>
> The time complexity of the algorithm is O(n*k*log(k)) where k is a number of 
> batches to merge and n is a number of records in each batch (assuming equal 
> size batches). As n*k is the total number of record to merge and it can be 
> quite large, minimizing k should give better results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-6030) Managed sort should minimize number of batches in a k-way merge

Reply via email to