[
https://issues.apache.org/jira/browse/CALCITE-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310420#comment-17310420
]
Vladimir Sitnikov commented on CALCITE-4558:
--------------------------------------------
Pointer to row has fixed size no matter what is the number of columns. In other
words, pointer swap has a fixed per-row cost, and it does not depend on the row
width/field count.
Good sorting algorithm should not access unused fields, so I see no reason why
do you always keep saying "proportional to the number of fields".
The cost of on-disk sort is different since it might require multiple passes
with store-load of the data in-between.
The key factor for costing on-disk sort is disk performance and the amount of
memory the algorithm can use for the intermediate steps.
I assume that the systems that perform on-disk sort would override the cost
function. If you want to support on-disk costing in Sort, please file a
separate JIRA for that.
For now I want fix Sort cost so it represents in-memory sort.
> Sort CPU cost should not incur per-field copy cost for alignment with filter
> and project
> ----------------------------------------------------------------------------------------
>
> Key: CALCITE-4558
> URL: https://issues.apache.org/jira/browse/CALCITE-4558
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.26.0
> Reporter: Vladimir Sitnikov
> Priority: Major
>
> Typical Java implementations of the sort do not copy rows (they copy
> references only), so
> it makes little sense to have "row width" as the key driver of the sort
> costing.
> The CPU cost for filter does not include "row copy" cost.
> Even though the implementations might be different, in-core costs should be
> aligned.
> For instance, the current, EnumerableLimitSort and EnumerableSort have
> bytesPerRow multiplier, however, the implementation does not copy rows
> field-by-field .
--
This message was sent by Atlassian Jira
(v8.3.4#803005)