[
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738551#comment-16738551
]
Vladimir Sitnikov commented on CALCITE-2635:
--------------------------------------------
{quote}@PerformanceTest(expectedDuration = "2s", variance = "5%"){quote}
Expected duration depends on the hardware. For instance, notebook, virtual
machine, desktop, vps, etc, all could have very different raw performance.
I think it is much better to invest time to having something like
https://arewefastyet.com
In other words, we could have a set of "standard" benchmarks + consistent
machine for execution + scheduled executions so we can track regressions.
I'm inclined to merge this fix with no extra tests.
Note: the change is a clear win.
Alternative option is to implement HashMap to speedup
{{org.apache.calcite.rel.type.RelDataType#getField(String fieldName, boolean
caseSensitive, boolean elideRecord)}}. We do have
{{org.apache.calcite.rel.type.RelDataTypeFactoryImpl#canonize(org.apache.calcite.rel.type.RelDataType)}},
so lazy initialized cache of field positions might help.
However, we don't really expect single table to have lots of collations, so we
could just go with PR#891
On top of that, we might add a hard limit like "try no more than first 50
collations of the table", so even a table with extreme amount of collations
won't create a problem for {{getMonotonocity}}
> getMonotonocity is slow on wide tables
> --------------------------------------
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Gian Merlino
> Assignee: Gian Merlino
> Priority: Major
> Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields.
> IdentifierNamespace calls getMonotonicity once for every field in the table
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4
> second query planning times with a table that had 18,000 columns, reduced to
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)