[jira] [Comment Edited] (DERBY-6211) Make Optimizer trace logic pluggable.

Rick Hillegas (JIRA) Tue, 11 Jun 2013 09:58:43 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680468#comment-13680468
 ]


Rick Hillegas edited comment on DERBY-6211 at 6/11/13 4:57 PM:
---------------------------------------------------------------

Attaching derby-6211-05-aa-xmlOptimizerTracer.diff. This patch adds an 
optimizer tracer which produces its output in xml. In addition, this patch adds 
an optional tool for viewing that xml output using SQL. I am running tests now.

I find that it is very hard to read the existing optimizer traces for the 
following reasons:

i) The trace output is not indented to indicate how facts relate to one another.

ii) In particular, it is hard to piece together the shapes of the query plans 
which are being evaluated.

I hope that this xml output is easier to read and understand. The output 
contains elements which nest like the corresponding optimizer data structures:

A) statement - This is the text of an SQL statement which needs optimization.

B) queryBlock - A statement may have many query blocks. For instance, a UNION 
statement contains many branches, each of which is its own query block. Most 
subqueries are also independent query blocks. Each query block is optimized in 
isolation from the others. The isolation goes so far that each query block gets 
its own optimizer instance.

C) joinOrder - For each query block, the optimizer may consider several join 
orders. These are the left-to-right orders in which tables would be accessed at 
execution time. The optimizer builds up a join order incrementally, starting 
from the leftmost position and adding more tables as it goes. The optimizer may 
abandon a join order before it is completely filled out. This happens if the 
optimizer determines that no completion of the join order can result in a plan 
that's cheaper than the best plan found so far.

D) decoration - As the optimizer fills out the join order, it also considers 
which conglomerate to use for each table and how to join the table to the table 
to its left. Of course, the leftmost table doesn't join to anything before it, 
so for the leftmost table the join strategy is always NESTED_LOOP.

The following other elements appear in the xml output:

E) planCost - The optimizer evaluates the costs of decorated join orders, 
including the costs of decorated but partial join orders. The planCost element 
nests inside the joinOrder element. In addition, each queryBlock contains a 
best planCost.

F) decConglomerateCost - This represents the cost of using a particular 
conglomerate to scan a table. This element nests inside the decoration element.

In addition to presenting cost information, the planCost element presents two 
descriptions of the decorated join order being evaluated:

1) summary - This is meant to be a compact, precise description of the plan 
which we might consider using in an optimizer override. This description 
identifies conglomerates by the (often cryptic) names stored in 
SYS.SYSCONGLOMERATES.

2) verbose - This is meant to be a more human-readable description of the plan. 
Tables are identified by their SQL names or by their correlation names in the 
query. In addition, index column names are included if the table is accessed by 
an index.

Although the optimizer considers how tables join leftward, English speakers 
will want to view the join order rightward. That is how the descriptions are 
written. In addition, I have introduced the following infix operators to 
represent the join strategies:

*               NESTED_LOOP
#               HASH_JOIN

I have also fully parenthesized the plan descriptions even though Derby only 
supports left-deep parentheses today. In the future, Derby may support bushy 
join orders, requiring different parenthesizing. Putting all of this together, 
here is a sample summary description:

((45b300a8-013f-33ba-d007-000003789be8 # b6b2c0ae-013f-33ba-d007-000003789be8) 
# 67bb80b4-013f-33ba-d007-000003789be8) # d8cd40ba-013f-33ba-d007-000003789be8

...and here is the corresponding verbose description:

((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4

For the following query...

select tablename from sys.systables t, sys.syscolumns c, sys.sysaliases a, 
sys.syssequences s
where t.tablename = c.columnname and c.columnname = a.alias and a.alias = 
s.sequencename

...here's a sample verbose plan description:

((C # T{TABLENAME,SCHEMAID}) # A{SCHEMAID,ALIAS,NAMESPACE}) # 
S{SCHEMAID,SEQUENCENAME}

I think that this xml output is much easier to read than traditional Derby 
optimizer traces. If you use a browser like Firefox, you can collapse and 
expand elements in order to expose just the information you want to see.

This patch also includes an optional tool (optimizerTracingViews) which gives 
you a SQL view of all of the planCost elements in the xml output. Here's an 
example of how to use xml optimizer tracing along with this optional viewing 
tool. Note that the tracer wants a file name argument but the viewer wants a 
file url argument:

-- turn on xml-based optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );

-- run this query through the tracer
select * from tab1, tab2, tab3, tab4 where -tab1.b = tab2.b and tab2.a = tab3.a 
and tab3.a = tab4.a;

-- turn off optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'z.xml' );

-- load the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 
'file:///Users/rh161140/derby/mainline/z.xml' );

-- view the costs of all complete plans
select estimatedCost, verbose from planCost
where complete
order by estimatedCost
;

-- unload the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );

Here is the output of the query against the planCost view:

ESTIMATEDCOST           |VERBOSE                                                
                                                                         
------------------------------------------------------------
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3          
                                                                         
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4          
                                                                         
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4          
                                                                         
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3          
                                                                         
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4          
                                                                         
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4          
                                                                         
18224.740429176647      |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1          
                                                                         
18224.740429176647      |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1          
                                                                         
27806.645417006024      |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1          
                                                                         
27806.645417006024      |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1          
                                                                         
27903.296430890066      |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1          
                                                                         
27903.296430890066      |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1          
                                                                         

Here is the full shape of the planCost view:

(
    text varchar( 32672 ),
    stmtID    int,
    qbID   int,
    complete  boolean,
    summary   varchar( 32672 ),
    verbose   varchar( 32672 ),
    type        varchar( 50 ),
    estimatedCost        double,
    estimatedRowCount    bigint
)

I think this functionality is useful enough now that other people can 
test-drive it. Before writing regression tests for this patch, I would like to 
gather feedback from developers about how to improve this basic functionality. 
For instance, is this readable enough? Is there some information from optimizer 
traces which you often use but which is missing from the xml output?

Further improvements could include:

I) Adding more information to the xml trace. Right now, I have only implemented 
a subset of the trace methods.

II) Adding more SQL views for reading the xml trace output.


Touches the following files:

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
M       
java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java

Changes to the signatures of some of the trace methods so that they give the 
xml tracer the information it needs.

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java
M       
java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java
M       java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java

Each JoinStrategy now has an operator symbol for use in planCost descriptions.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java

The new xml trace logic.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/OptTraceViewer.java
M       java/engine/org/apache/derby/catalog/Java5SystemProcedures.java
M       tools/jar/extraDBMSclasses.properties

The new optional tool for viewing xml trace output.

                
      was (Author: rhillegas):
    Attaching derby-6211-05-aa-xmlOptimizerTracer.diff. This patch adds an 
optimizer tracer which produces its output in xml. In addition, this patch adds 
an optional tool for viewing that xml output using SQL. I am running tests now.

I find that it is very hard to read the existing optimizer traces for the 
following reasons:

i) The trace output is not indented to indicate how facts relate to one another.

ii) In particular, it is hard to piece together the shapes of the query plans 
which are being evaluated.

I hope that this xml output is easier to read and understand. The output 
contains elements which nest like the corresponding optimizer data structures:

A) statement - This is the text of an SQL statement which needs optimization.

B) queryBlock - A statement may have many query blocks. For instance, a UNION 
statement contains many branches, each of which is its own query block. Most 
subqueries are also independent query blocks. Each query block is optimized in 
isolation from the others. The isolation goes so far that each query block gets 
its own optimizer instance.

C) joinOrder - For each query block, the optimizer may consider several join 
orders. These are the left-to-right orders in which tables would be accessed at 
execution time. The optimizer builds up a join order incrementally, starting 
from the leftmost position and adding more tables as it goes. The optimizer may 
abandon a join order before it is completely filled out. This happens if the 
optimizer determines that no completion of the join order can result in a plan 
that's cheaper than the best plan found so far.

D) decoration - As the optimizer fills out the join order, it also considers 
which conglomerate to use for each table and how to join the table to the table 
to its left. Of course, the leftmost table doesn't join to anything before it, 
so for the leftmost table the join strategy is always NESTED_LOOP.

The following other elements appear in the xml output:

E) planCost - The optimizer evaluates the costs of decorated join orders, 
including the costs of decorated but partial join orders. The planCost element 
nests inside the joinOrder element. In addition, each queryBlock contains a 
best planCost.

F) decConglomerateCost - This represents the cost of using a particular 
conglomerate to scan a table. This element nests inside the decoration element.

In addition to presenting cost information, the planCost element presents two 
descriptions of the decorated join order being evaluated:

1) summary - This is meant to be a compact, precise description of the plan 
which we might consider using in an optimizer override. This description 
identifies conglomerates by the (often cryptic) names stored in 
SYS.SYSCONGLOMERATES.

2) verbose - This is meant to be a more human-readable description of the plan. 
Tables are identified by their SQL names or by their correlation names in the 
query. In addition, index column names are included if the table is accessed by 
an index.

Although the optimizer considers how tables join leftward, English speakers 
will want to view the join order rightward. That is how the descriptions are 
written. In addition, I have introduced the following infix operators to 
represent the join strategies:

*               NESTED_LOOP
#               HASH_JOIN

I have also fully parenthesized the plan descriptions even though Derby only 
supports left-deep parentheses today. In the future, Derby may support bushy 
join orders, requiring different parenthesizing. Putting all of this together, 
here is a sample summary description:

((45b300a8-013f-33ba-d007-000003789be8 # b6b2c0ae-013f-33ba-d007-000003789be8) 
# 67bb80b4-013f-33ba-d007-000003789be8) # d8cd40ba-013f-33ba-d007-000003789be8

...and here is the corresponding verbose description:

((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4

For the following query...

select tablename from sys.systables t, sys.syscolumns c, sys.sysaliases a, 
sys.syssequences s
where t.tablename = c.columnname and c.columnname = a.alias and a.alias = 
s.sequencename

...here's a sample verbose plan description:

((C # T{TABLENAME,SCHEMAID}) # A{SCHEMAID,ALIAS,NAMESPACE}) # 
S{SCHEMAID,SEQUENCENAME}

I think that this xml output is much easier to read than traditional Derby 
optimizer traces. If you use a browser like Firefox, you can collapse and 
expand elements in order to expose just the information you want to see.

This patch also includes an optional tool (optimizerTracingViews) which gives 
you a SQL view of all of the planCost elements in the xml output. Here's an 
example of how to use xml optimizer tracing along with this optional viewing 
tool. Note that the tracer wants a file name argument but the viewer wants a 
file url argument:

-- turn on xml-based optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );

-- run this query through the tracer
select * from tab1, tab2, tab3, tab4 where -tab1.b = tab2.b and tab2.a = tab3.a 
and tab3.a = tab4.a;

-- turn off optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'z.xml' );

-- load the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 
'file:///Users/rh161140/derby/mainline/z.xml' );

-- view the costs of all complete plans
select estimatedCost, verbose from planCost
where complete
order by estimatedCost
;

-- unload the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );

Here is the output of the query against the planCost view:

ESTIMATEDCOST           |VERBOSE                                                
                                                                         
------------------------------------------------------------
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3          
                                                                         
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4          
                                                                         
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4          
                                                                         
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3          
                                                                         
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4          
                                                                         
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4          
                                                                         
18224.740429176647      |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1          
                                                                         
18224.740429176647      |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1          
                                                                         
27806.645417006024      |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1          
                                                                         
27806.645417006024      |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1          
                                                                         
27903.296430890066      |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1          
                                                                         
27903.296430890066      |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1          
                                                                         
28336.34888640299       |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1          
                                                                         


I think this functionality is useful enough now that other people can 
test-drive it. Before writing regression tests for this patch, I would like to 
gather feedback from developers about how to improve this basic functionality. 
For instance, is this readable enough? Is there some information from optimizer 
traces which you often use but which is missing from the xml output?

Further improvements could include:

I) Adding more information to the xml trace. Right now, I have only implemented 
a subset of the trace methods.

II) Adding more SQL views for reading the xml trace output.


Touches the following files:

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
M       
java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java

Changes to the signatures of some of the trace methods so that they give the 
xml tracer the information it needs.

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java
M       
java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java
M       java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java

Each JoinStrategy now has an operator symbol for use in planCost descriptions.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java

The new xml trace logic.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/OptTraceViewer.java
M       java/engine/org/apache/derby/catalog/Java5SystemProcedures.java
M       tools/jar/extraDBMSclasses.properties

The new optional tool for viewing xml trace output.

                  
> Make Optimizer trace logic pluggable.
> -------------------------------------
>
>                 Key: DERBY-6211
>                 URL: https://issues.apache.org/jira/browse/DERBY-6211
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.11.0.0
>            Reporter: Rick Hillegas
>            Assignee: Rick Hillegas
>         Attachments: derby-6211-01-aa-createPlugin.diff, 
> derby-6211-02-aa-cleanup.diff, derby-6211-02-ab-cleanup.diff, 
> derby-6211-03-aa-customTracer.diff, 
> derby-6211-04-aa-moveOptimizerTracerToEngineJar.diff, 
> derby-6211-05-aa-xmlOptimizerTracer.diff
>
>
> Right now the trace logic in the optimizer is hard-coded to produce a stream 
> of diagnostics. It would be good to be able to plug alternative trace logic 
> into the optimizer. This would make the following possible:
> 1) Plug in trace logic which produces formats which are easier to study and 
> which can be analyzed mechanically. E.g., xml formatted output.
> 2) Plug in trace logic which can be used during unit testing to verify that 
> the optimizer has picked the right plan. Over time this might make it easier 
> to migrate canon-based tests to assertion-based tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (DERBY-6211) Make Optimizer trace logic pluggable.

Reply via email to