GitHub user DaveBirdsall opened a pull request:
https://github.com/apache/incubator-trafodion/pull/1029
[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns
This pull request submits a performance enhancement to the UPDATE
STATISTICS utility. This work is the completion of a prototype originally done
by Barry Fritchman (@blfritch).
For the moment, the feature is turned off by default. Use CQD
USTAT_COMPARE_VARCHARS 'ON' to turn on this enhancement.
What this feature does is compact varchars in memory for the internal sort
code path in UPDATE STATISTICS. In the old code, varchars are expanded out to
their full length. (Actually, we already truncate them at 256 characters -- the
setting of CQD USTAT_MAX_CHAR_COL_LENGTH_IN_BYTES -- giving up some accuracy in
UEC computation perhaps but improving performance dramatically for very long
varchar columns.) In the new code, we estimate the average length of the
column, and allocate space assuming the column still adheres to that average.
For columns that already have statistics, we use the average varchar length
stored in SB_HISTOGRAMS column V2. For columns that don't, we take a guess that
the average is one-half the declared length of the column.
The performance gain from using this feature comes from reducing the number
of scans of the table or sample table because more columns can fit in memory in
each scan.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/DaveBirdsall/incubator-trafodion Trafodion2376
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafodion/pull/1029.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1029
----
commit 3366fdba1b9d52e7d04d21ee33f92698089cdb36
Author: Dave Birdsall <[email protected]>
Date: 2017-03-28T17:16:00Z
[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---