[
https://issues.apache.org/jira/browse/TAJO-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyunsik Choi resolved TAJO-561.
-------------------------------
Resolution: Won't Fix
After TAJO-36 and TAJO-584, ExternalSortExec is always the best executor for
sort because it works in an in-memory algorithm until the input tuples execeeds
the memory.
> PhysicalPlanner::createBestSortPlan should consider input size of leaf tasks
> ----------------------------------------------------------------------------
>
> Key: TAJO-561
> URL: https://issues.apache.org/jira/browse/TAJO-561
> Project: Tajo
> Issue Type: Bug
> Components: physical operator
> Reporter: Hyunsik Choi
> Fix For: 0.8-incubating
>
>
> Please take a look at the following code. This code determines which sort
> operator is chosen according to the input volume. Here are two problems. One
> is threshold is constant value, and it must be configurable. The second
> problem is that estimateSizeRecursive does not obtain an input volume if a
> task is leaf. We should fix them. In addition, I think that
> estimateSizeRecursive should be renamed to more proper name. It's vague and
> does not follow our naming convention.
> {code:java}
> public SortExec createBestSortPlan(TaskAttemptContext context, SortNode
> sortNode,
> PhysicalExec child) throws IOException {
> String [] outerLineage =
> PlannerUtil.getRelationLineage(sortNode.getChild());
> long estimatedSize = estimateSizeRecursive(context, outerLineage);
> final long threshold = 1048576 * 2000;
> // if the relation size is less than the reshold,
> // the in-memory sort will be used.
> if (estimatedSize <= threshold) {
> return new MemSortExec(context, sortNode, child);
> } else {
> return new ExternalSortExec(context, sm, sortNode, child);
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)