[ https://issues.apache.org/jira/browse/DRILL-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299174#comment-16299174 ]
ASF GitHub Bot commented on DRILL-6030: --------------------------------------- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1075#discussion_r158150320 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortConfig.java --- @@ -84,7 +85,7 @@ public SortConfig(DrillConfig config) { if (limit > 0) { mergeLimit = Math.max(limit, MIN_MERGE_LIMIT); } else { - mergeLimit = Integer.MAX_VALUE; + mergeLimit = DEFAULT_MERGE_LIMIT; --- End diff -- There may be a misunderstanding of how config options work. We define the defaults in Drill's own source code: `drill-module.conf` in each module. (Here it is in `java-exec`.) To change the default option, we change the value in `drill-module.conf`. In the highly unlikely case that a user has overridden this value in `drill-override.conf`, their value will be used. But, the option is not documented in `drill-override-example.conf` so it is very, very unlikely that anyone created an override. (The property is meant to be internal, for use in tests.) So, rather than introducing yet another variable, we might as well use the existing config property. This has the added advantage that, if experience suggests that we need a smaller or larger limit for some scenarios, we can make the adjustment in the field via the config system. > Managed sort should minimize number of batches in a k-way merge > --------------------------------------------------------------- > > Key: DRILL-6030 > URL: https://issues.apache.org/jira/browse/DRILL-6030 > Project: Apache Drill > Issue Type: Improvement > Reporter: Vlad Rozov > Assignee: Vlad Rozov > > The time complexity of the algorithm is O(n*k*log(k)) where k is a number of > batches to merge and n is a number of records in each batch (assuming equal > size batches). As n*k is the total number of record to merge and it can be > quite large, minimizing k should give better results. -- This message was sent by Atlassian JIRA (v6.4.14#64029)