[
https://issues.apache.org/jira/browse/TRAFODION-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945592#comment-15945592
]
ASF GitHub Bot commented on TRAFODION-2376:
-------------------------------------------
GitHub user DaveBirdsall opened a pull request:
https://github.com/apache/incubator-trafodion/pull/1029
[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns
This pull request submits a performance enhancement to the UPDATE
STATISTICS utility. This work is the completion of a prototype originally done
by Barry Fritchman (@blfritch).
For the moment, the feature is turned off by default. Use CQD
USTAT_COMPARE_VARCHARS 'ON' to turn on this enhancement.
What this feature does is compact varchars in memory for the internal sort
code path in UPDATE STATISTICS. In the old code, varchars are expanded out to
their full length. (Actually, we already truncate them at 256 characters -- the
setting of CQD USTAT_MAX_CHAR_COL_LENGTH_IN_BYTES -- giving up some accuracy in
UEC computation perhaps but improving performance dramatically for very long
varchar columns.) In the new code, we estimate the average length of the
column, and allocate space assuming the column still adheres to that average.
For columns that already have statistics, we use the average varchar length
stored in SB_HISTOGRAMS column V2. For columns that don't, we take a guess that
the average is one-half the declared length of the column.
The performance gain from using this feature comes from reducing the number
of scans of the table or sample table because more columns can fit in memory in
each scan.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/DaveBirdsall/incubator-trafodion Trafodion2376
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafodion/pull/1029.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1029
----
commit 3366fdba1b9d52e7d04d21ee33f92698089cdb36
Author: Dave Birdsall <[email protected]>
Date: 2017-03-28T17:16:00Z
[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns
----
> Improve UPDATE STATISTICS performance for varchar columns
> ---------------------------------------------------------
>
> Key: TRAFODION-2376
> URL: https://issues.apache.org/jira/browse/TRAFODION-2376
> Project: Apache Trafodion
> Issue Type: Bug
> Components: sql-cmp
> Affects Versions: 2.1-incubating
> Environment: All
> Reporter: David Wayne Birdsall
> Assignee: David Wayne Birdsall
>
> Today when UPDATE STATISTICS uses internal sort, varchar columns are expanded
> out to their maximum length with blank padding. This can be quite wasteful
> both of memory and CPU cycles, as often the average length of a varchar is
> much less (even orders of magnitude less) than the maximum length. We can do
> much better performance wise by not doing this expansion, at the cost of some
> complexity in comparison.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)