[ 
https://issues.apache.org/jira/browse/TRAFODION-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329117#comment-16329117
 ] 

David Wayne Birdsall commented on TRAFODION-2913:
-------------------------------------------------

Here is how I did the recalibration.
 # I created a table and, using gdb in IndexDesc :: pruneMdam determined the 
row size, block size and rows per block computed for that table.
 # I created a sqlci script that would populate a similar table with 500,000 
rows, then used that table to populate other tables of varying sizes: 1 block 
(=4096 rows in this case), 10 blocks and 100 blocks. The latter tables have a 
two-column key. The UEC of the first column was controlled by using the MOD 
function when populating; the second column had unique values.
 # I then ran a python script that runs a query against a table, forcing simple 
scan plans and various MDAM plans, and capturing the elapsed execution time. 
The script runs each query three times, capturing execution times for each. 
(The first time tends to be larger because the table gets read into HBase 
cache. The second and third times tend to benefit from the table already being 
in cache. So the read time is amortized across the three executions.)
 # The tables I created were single-partition, so I looked only at serial 
simple scan and serial mdam 2 deep plan times. I noted the average execution 
times for each, and noted also what the Optimizer was picking (using the old 
Mdam costing code, which is the default choice today).

The inequality we are trying to tune in IndexDesc :: pruneMdam is UEC <= Blocks 
* MDAM_SELECTION_DEFAULT, that is, UEC / Blocks <= MDAM_SELECTION_DEFAULT. So, 
we want this inequality to hold true when MDAM plans win (or even if MDAM plans 
are close to winning), and allow this inequality to be false when MDAM plans 
are bad.

I ran this test with UECs of 2, 20, 200, 50, 500, 80 and 800. The inequality 
above worked best when MDAM_SELECTION_DEFAULT was given a value of 8.0.

The table shape I used is below:

For the following:

CREATE TABLE TRAFODION.SCH.T2

  (

    A                                INT NO DEFAULT NOT NULL NOT DROPPABLE NOT

      SERIALIZED

  , B                                INT NO DEFAULT NOT NULL NOT DROPPABLE NOT

      SERIALIZED

  , C                                INT DEFAULT NULL NOT SERIALIZED

  , PRIMARY KEY (A ASC, B ASC)

  )

 ATTRIBUTES ALIGNED FORMAT

;

In IndexDesc :: pruneMdam, I saw the following for block size, row length and 
rows per block:

(gdb) p blockSize

$8 = \{dpv_ = 65536, static ovflwCount_ = 0, static udflwCount_ = 0}

(gdb) p recordLength

$9 = \{dpv_ = 16, static ovflwCount_ = 0, static udflwCount_ = 0}

(gdb) p recordsPerBlock

$10 = \{dpv_ = 4096, static ovflwCount_ = 0, static udflwCount_ = 0}

(gdb)

Below is the sqlci script I used to populate the tables. (I manually edited the 
script for different UECs.)

drop table if exists t2sourceuec200;

drop table if exists t2blk1uec200;

drop table if exists t2blk10uec200;

drop table if exists t2blk100uec200;

create table t2sourceuec200(a int not null, b int not null, c int, primary key 
(b,a)); -- key reversed for efficient upserts below

upsert using load into t2sourceuec200
select 
 mod(x1+10*x2+100*x3+1000*x4+10000*x5+100000*x6, 200),
 x1+10*x2+100*x3+1000*x4+10000*x5+100000*x6,
 x1+10*x2+100*x3+1000*x4+10000*x5+100000*x6
 from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) T(x1)
transpose 0,1,2,3,4,5,6,7,8,9 as x2
transpose 0,1,2,3,4,5,6,7,8,9 as x3
transpose 0,1,2,3,4,5,6,7,8,9 as x4 
transpose 0,1,2,3,4,5,6,7,8,9 as x5
transpose 0,1,2,3,4 as x6;
 
create table t2blk1uec200 (a int not null, b int not null, c int, primary key 
(a,b));

create table t2blk10uec200 like t2blk1uec200;

create table t2blk100uec200 like t2blk1uec200;

upsert using load into t2blk1uec200
 select a,b,c from t2sourceuec200 where b < 4096;

upsert using load into t2blk10uec200
 select a,b,c from t2sourceuec200 where b < 40960;

upsert using load into t2blk100uec200
 select a,b,c from t2sourceuec200 where b < 409600;

update statistics for table t2blk1uec200 on every column;

update statistics for table t2blk10uec200 on every column;

update statistics for table t2blk100uec200 on every column;

Below is the Python script I used to generate test data.

{code:Python}

 
{code:java}
#
# Sketch of MdamTests2.py script
#
# Purpose: To test the performance of single subset vs. MDAM
# plans for a simple SQL scan, and to check what plan the 
# optimizer picks. The script prepares, explains 
# and executes a SQL statement for each plan,
# capturing certain details from explain and execute and
# packaging that as data that can be inserted into a SQL table
# for later analysis.
#
# The type of SQL statement expected is select * from t
# where <predicates on t>. That is, the script is just looking
# for a single SCAN node in the explain output.
#
#
# Flow:
#
# Input: A SQL statement, a file to append output
#
# Start sqlci process
# Do a showshape of the SQL statement, to determine the
# fully-qualified name of the table.
# exit;
#
# For each plan:
# Start a sqlci process
# Send the following commands to it:
# cqd last0_mode 'ON';
# a cqd or cqs to force a particular plan (or nothing, 
# if checking the Optimizer's choice)
# prepare s from <input text>
# explain s;
# execute s;
# display statistics;
# execute s;
# display statistics;
# execute s;
# display statistics;
# exit;
#
# Process the output from sqlci
# Iterate through the lines
# Extract: error or success from prepare
# Extract: various info from explain
# Extract: error or success from each execute
# Extract: statistical info from each display
#
# The plans considered are:
# 1. MDAM off (CQD MDAM_SCAN_METHOD "OFF")
# 2-n. MDAM forced (CQS, with varying numbers of columns),
# varying ESP parallelism on or off for each
# n+1. Optimizer choice (no forcing)
#
# This process output logic can use a state
# machine to navigate its way through the output
#
# Assemble all this stuff into a long row
# Write it out in SQL INSERT syntax, appending it
# to a file.
#
import os
import sys
import subprocess 
import time
import re
import sets # not needed?
import argparse # requires Python 2.7

# global used to enumerate input and log files
sqlciSessionNumber = 0


def sqlciInputFileName():
# Returns name to use for sqlci input file
global sqlciSessionNumber
return 'temp.' + str(os.getpid()) + '.' + str(sqlciSessionNumber) + '.sql'


def sqlciLogFileName():
# Returns name to use for sqlci input file
global sqlciSessionNumber
fileName = 'temp.' + str(os.getpid()) + '.' + str(sqlciSessionNumber) + '.log'
sqlciSessionNumber = sqlciSessionNumber + 1
return fileName 


def findFullyQualifiedTableName(statement,retain):
# Find the fully qualified name of the table referenced
# in the SQL statement
inputFileName = sqlciInputFileName()
logFileName = sqlciLogFileName()

f = open(inputFileName,'w')
f.write('log '+logFileName+';\n')
f.write('showshape\n')
f.write(statement+';\n')
f.write('exit;\n')
f.close()

# look for "path 'X.Y.Z'"
pattern = r".*path (?P<name>\'[A-Z0-9_]*\.[A-Z0-9_]*\.[A-Z0-9_]*\').*"
prog = re.compile(pattern)

tableName = None

p1 = subprocess.Popen(["sqlci","-i",inputFileName], stdout=subprocess.PIPE)
for line in p1.stdout:
result = prog.match(line)
if result:
tableName = result.group('name')
break


if not retain:
os.unlink(inputFileName)
os.unlink(logFileName)

return tableName


def processOnePlan(statement,planNumber,tableName,g,retain):
# Prepare, Explain and Execute the statement for a particular plan

planCQDorCQS = None
planName = "Invalid"
if planNumber == 0:
planCQDorCQS = None # optimizer's choice
planName = "Optimizer choice"
elif planNumber == 1:
planCQDorCQS = "CQD MDAM_SCAN_METHOD 'OFF';\n" # simple scan
planName = "Simple scan"
elif planNumber > 1:
planCQDorCQS = "control query shape "
depth = planNumber / 2;
if planNumber % 2:
planCQDorCQS = planCQDorCQS + "esp_exchange("
planName = "Parallel mdam " + str(depth) + " deep"
else:
planName = "Serial mdam " + str(depth) + " deep"
planCQDorCQS = planCQDorCQS + "scan(path " + tableName + ", mdam forced, 
mdam_columns(dense"
while depth > 1:
planCQDorCQS = planCQDorCQS + ",sparse"
depth = depth - 1
planCQDorCQS = planCQDorCQS + "))"
if planNumber % 2:
planCQDorCQS = planCQDorCQS + ");\n"
else:
planCQDorCQS = planCQDorCQS + ";\n"

# create an input file to contain the SQLCI commands

inputFileName = sqlciInputFileName()
logFileName = sqlciLogFileName()
f = open(inputFileName,'w')
f.write('log '+logFileName+';\n')
if planCQDorCQS:
f.write(planCQDorCQS)
f.write("CQD LAST0_MODE 'ON';\n")
f.write('prepare s from '+statement+';\n')
f.write('explain s;\n')
f.write('execute s;\n')
f.write('display statistics;\n')
f.write('execute s;\n')
f.write('display statistics;\n')
f.write('execute s;\n')
f.write('display statistics;\n')
f.write('exit;\n')
f.close() 

# execute the SQLCI commands

p1 = subprocess.Popen(["sqlci","-i",inputFileName], stdout=subprocess.PIPE)

# initialize extracted data items

timeStamp = time.time()
prepareError = ''
executeError = ''
fragmentType = ''
execPreds = ''
mdamDisj = ''
beginKey = ''
endKey = ''
executionInstance = -1
startTime = ''
endTime = ''
recordsAccessed = '0'
recordsUsed = '0'
hdfsIOs = '0'
hdfsIOBytes = '0'
hdfsAccessTime = '0' 

# process the lines, extracting the information, and appending it to the
# output file

state = 0
for line in p1.stdout:
if state < 6: # if no errors found yet
if line.startswith('>>prepare s'):
state = 1
elif line.startswith('>>explain s'):
state = 2
explainState = 0
elif line.startswith('>>execute s'):
state = 3
elif line.startswith('>>display statistics'):
state = 4
displayState = 0
executionInstance = executionInstance + 1
startTime = ''
endTime = ''
recordsAccessed = '0'
recordsUsed = '0'
hdfsIOs = '0'
hdfsIOBytes = '0'
hdfsAccessTime = '0'
elif line.startswith('>>exit'):
state = 5
elif state == 1:
if line.startswith('*** ERROR'):
prepareError = line.rstrip()
state = 6
elif state == 2:
if explainState == 0:
if line.startswith('TRAFODION_SCAN ='):
explainState = 1
appendState = ''
elif explainState == 1:
if line.startswith(' fragment_type .......... '):
fragmentType = line.rstrip()
fragmentType = fragmentType[len(' fragment_type .......... '):]
appendState = ''
elif line.startswith(' executor_predicates .... '):
execPreds = line.rstrip()
execPreds = execPreds[len(' executor_predicates .... '):]
appendState = 'e'
elif line.startswith(' mdam_disjunct .......... '):
# actually there may be more than one of these; we just get the last one
mdamDisj = line.rstrip()
mdamDisj = mdamDisj[len(' mdam_disjunct .......... '):]
appendState = 'm'
elif line.startswith(' begin_key .............. '):
beginKey = line.rstrip()
beginKey = beginKey[len(' begin_key .............. '):]
appendState = 'b'
elif line.startswith(' end_key ................ '):
endKey = line.rstrip()
endKey = endKey[len(' end_key ................ '):]
appendState = 'f'
elif line.startswith(' '):
# continuation of a previous line
if appendState == 'm':
mdamDisj = mdamDisj + ' ' + line.strip()
elif appendState == 'e':
execPreds = execPreds + ' ' + line.strip()
elif appendState == 'b':
beginKey = beginKey + ' ' + line.strip()
elif appendState == 'f':
endKey = endKey + ' ' + line.strip()
elif line.find('====',1) > 0:
# we have passed the end of TRAFODION_SCAN to some other operator
explainState = 2
else:
appendState = ''
elif state == 3:
if line.startswith('*** ERROR'):
executeError = line.rstrip()
state = 6
elif state == 4:
if displayState == 0:
if line.startswith('Start Time'):
startTime = line[len('Start Time'):]
startTime = startTime.strip()
elif line.startswith('End Time'):
endTime = line[len('End Time'):]
endTime = endTime.strip()
elif line.startswith('Table Name'):
displayState = 1
elif displayState == 1: # second header line (after Table Name)
displayState = 2
elif displayState == 2: # the actual table name
displayState = 3
elif displayState == 3: # a set of 5 integers
displayState = 4
values = line.split()
recordsAccessed = values[0]
recordsUsed = values[1]
hdfsIOs = values[2]
hdfsIOBytes = values[3]
hdfsAccessTime = values[4]
extractedStats = "INSERT INTO RESULTSTATS2 VALUES("+str(timeStamp)+","+\
str(executionInstance)+",TIMESTAMP '"+\
startTime+"',TIMESTAMP '"+\
endTime+"',"+\
recordsAccessed+","+\
recordsUsed+","+\
hdfsIOs+","+\
hdfsIOBytes+","+\
hdfsAccessTime+");\n"
g.write(extractedStats)


extractedData = "INSERT INTO RESULTS2 VALUES("+str(timeStamp)+",'"+\
planName+"','"+\
args.statement+"','"+\
prepareError+"','"+\
executeError+"','"+\
fragmentType+"','"+\
execPreds+"','"+\
mdamDisj+"','"+\
beginKey+"','"+\
endKey+"');\n"

g.write(extractedData)

if not retain:
os.unlink(inputFileName)
os.unlink(logFileName)

return


# process command line arguments

parser = argparse.ArgumentParser(
description='This script prepares, explains and executes a SQL statement and 
extracts information from these.')

parser.add_argument('--statement',required=True,
help='SQL statement to be processed.')
parser.add_argument('--output',required=True,
help='File to append extracted info to.')
parser.add_argument('--traversaldepth', required=True, type=int, choices=[1, 2, 
3, 4, 5, 6, 7, 8],
help='How many key columns MDAM should traverse.')
parser.add_argument('--retainfiles', action='store_true',
help='If specified, the sqlci input files and logs are retained.')


args = parser.parse_args() # exits and prints help if args are incorrect

exitCode = 0

tableName = findFullyQualifiedTableName(args.statement,args.retainfiles)
if tableName:
print "Testing statement " + args.statement
print

g = open(args.output,"a")
plansToTry = 2 * (args.traversaldepth + 1)
for planNumber in range(plansToTry):
processOnePlan(args.statement,planNumber,tableName,g,args.retainfiles)

g.close()
else:
print "Could not obtain qualified table name from " + args.statement
print
exitCode = 1

exit(exitCode)



{code}
 

 

> Tweak some MDAM-related heuristics
> ----------------------------------
>
>                 Key: TRAFODION-2913
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2913
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.3
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>            Priority: Major
>
> While debugging a plan choice issue on a customer query, two issues were 
> noted with MDAM heuristics.
>  # When CQD FSO_TO_USE is set to '0', FileScanOptimizer::optimize attempts to 
> perform logic similar to that in ScanOptimizer::getMdamStatus, checking the 
> mdamFlag that is stored in the index descriptor. But the logic is not the 
> same (the inevitable result of having two copies of something!); in the 
> latter case the mdamFlag is ignored if CQD RANGESPEC_TRANSFORMATION is 'ON' 
> while in the FileScanOptimizer::optimize logic no such additional check is 
> made. Now, 'ON' is presently the default for RANGESPACE_TRANSFORMATION. So, 
> we have the anomaly that using CQD FSO_TO_USE '0' to force consideration of 
> MDAM might still not get MDAM because of a flag that we would ignore 
> otherwise.
>  # The mdamFlag in the IndexDesc object is set by IndexDesc :: pruneMdam 
> (optimizer/IndexDesc.cpp). There is heuristic logic there to guess whether 
> MDAM will be useful for a given access path. The major purpose of this logic 
> is index elimination: if we have several indexes, and some look like good 
> choices for MDAM and others not, we eliminate the ones that are not. Only 
> secondarily is this mdam flag later looked at by the scan optimizer, as 
> described above in 1. The major purpose of this logic still seems reasonable, 
> though the computation logic itself can be criticized for not considering the 
> possibility of a parallel predicate on a leading "_SALT_" column, for 
> example. But the computation involves a CQD, MDAM_SELECTION_DEFAULT, which is 
> set to a low value by default. The customer query involved showed that the 
> value used is too low; this flag ended up eliminating a favorable MDAM plan. 
> The default was likely last determined in the predecessor product; given that 
> the HBase engine has different execution dynamics this value needs to be 
> recalibrated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to