[ 
https://issues.apache.org/jira/browse/TRAFODION-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945592#comment-15945592
 ] 

ASF GitHub Bot commented on TRAFODION-2376:
-------------------------------------------

GitHub user DaveBirdsall opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/1029

    [TRAFODION-2376] Improve UPDATE STATS performance on varchar columns

    This pull request submits a performance enhancement to the UPDATE 
STATISTICS utility. This work is the completion of a prototype originally done 
by Barry Fritchman (@blfritch).
    
    For the moment, the feature is turned off by default. Use CQD  
USTAT_COMPARE_VARCHARS 'ON' to turn on this enhancement.
    
    What this feature does is compact varchars in memory for the internal sort 
code path in UPDATE STATISTICS. In the old code, varchars are expanded out to 
their full length. (Actually, we already truncate them at 256 characters -- the 
setting of CQD USTAT_MAX_CHAR_COL_LENGTH_IN_BYTES -- giving up some accuracy in 
UEC computation perhaps but improving performance dramatically for very long 
varchar columns.) In the new code, we estimate the average length of the 
column, and allocate space assuming the column still adheres to that average. 
For columns that already have statistics, we use the average varchar length 
stored in SB_HISTOGRAMS column V2. For columns that don't, we take a guess that 
the average is one-half the declared length of the column.
    
    The performance gain from using this feature comes from reducing the number 
of scans of the table or sample table because more columns can fit in memory in 
each scan. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DaveBirdsall/incubator-trafodion Trafodion2376

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/1029.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1029
    
----
commit 3366fdba1b9d52e7d04d21ee33f92698089cdb36
Author: Dave Birdsall <[email protected]>
Date:   2017-03-28T17:16:00Z

    [TRAFODION-2376] Improve UPDATE STATS performance on varchar columns

----


> Improve UPDATE STATISTICS performance for varchar columns
> ---------------------------------------------------------
>
>                 Key: TRAFODION-2376
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2376
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.1-incubating
>         Environment: All
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>
> Today when UPDATE STATISTICS uses internal sort, varchar columns are expanded 
> out to their maximum length with blank padding. This can be quite wasteful 
> both of memory and CPU cycles, as often the average length of a varchar is 
> much less (even orders of magnitude less) than the maximum length. We can do 
> much better performance wise by not doing this expansion, at the cost of some 
> complexity in comparison.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to