GitHub user DaveBirdsall opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/1029

    [TRAFODION-2376] Improve UPDATE STATS performance on varchar columns

    This pull request submits a performance enhancement to the UPDATE 
STATISTICS utility. This work is the completion of a prototype originally done 
by Barry Fritchman (@blfritch).
    
    For the moment, the feature is turned off by default. Use CQD  
USTAT_COMPARE_VARCHARS 'ON' to turn on this enhancement.
    
    What this feature does is compact varchars in memory for the internal sort 
code path in UPDATE STATISTICS. In the old code, varchars are expanded out to 
their full length. (Actually, we already truncate them at 256 characters -- the 
setting of CQD USTAT_MAX_CHAR_COL_LENGTH_IN_BYTES -- giving up some accuracy in 
UEC computation perhaps but improving performance dramatically for very long 
varchar columns.) In the new code, we estimate the average length of the 
column, and allocate space assuming the column still adheres to that average. 
For columns that already have statistics, we use the average varchar length 
stored in SB_HISTOGRAMS column V2. For columns that don't, we take a guess that 
the average is one-half the declared length of the column.
    
    The performance gain from using this feature comes from reducing the number 
of scans of the table or sample table because more columns can fit in memory in 
each scan. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DaveBirdsall/incubator-trafodion Trafodion2376

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/1029.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1029
    
----
commit 3366fdba1b9d52e7d04d21ee33f92698089cdb36
Author: Dave Birdsall <[email protected]>
Date:   2017-03-28T17:16:00Z

    [TRAFODION-2376] Improve UPDATE STATS performance on varchar columns

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to