[
https://issues.apache.org/jira/browse/DERBY-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480546
]
A B commented on DERBY-47:
--------------------------
> 1) In one of the comments, you said:
>
> We intentionally use a parameter node instead of a constant node
> because the IN-list has more than one value
>
> It's not *always* true that the IN-list has more than one value, right?
> That is, it would be legal, if not very useful, to write
>
> SELECT * FROM t WHERE c IN (1)
Yes, an IN-list can only have a single value. However, if such an IN-list
occurs we will convert it into an equality predicate as part of the first "if"
branch in the preprocess() method:
/* Check for the degenerate case of a single element in the IN list.
* If found, then convert to "=".
*/
if (rightOperandList.size() == 1)
...
Thus we won't ever get to the code referenced above. But you're right, it
might be good to add an explanatory comment to explain this.
> 2) Do you have any test cases of the form
>
> WHERE c IN (SELECT c_prime from t_prime)
>
> That is, where the IN-list is neither a list of literal constants, nor a
> list of parameter markers, but is instead a subquery whose values will
> be used as the list.
A few test cases for this kind of query already exist in lang/inbetween.sql and
also in lang/subquery.sql. So I did not add any new test cases with my
changes. Is there anything in particular you think could be problematic here?
> Does such a query generate and use the new style Multi-Probe processing?
No, it does not. Multi-probe processing only occurs if the IN-list is solely
comprised of constant and/or parameter nodes. A subquery is neither constant
nor parameter, hence no multi-probing will occur. The code for this is in the
preprocess() method of InListOperatorNode:
else if ((leftOperand instanceof ColumnReference) &&
rightOperandList.containsOnlyConstantAndParamNodes())
{
<create probe predicate>
}
> 3) Do you have any test cases in which the IN-list predicate references a
> column in a UNION or a UNION view, thus requiring pushing the IN-list
> predicate down and pulling it back up?
I added a few, simplified test queries for this situation to lang/inbetween.sql
as part of d47_mp_addlTestCases.patch (under the heading "Nested queries with
unions and top-level IN list"). There are also a handful of queries in
lang/predicatePushdown.sql that include IN lists in addition to equality
predicates. Feel free to comment if you think more testing should be done
here...
> 4) In the change to OrNode.java, my eye was caught by the variable name
> "beon".
> I've seen that a common convention is to use an acronym, so maybe "beon"
> stood for Binary Equality Operator Node, or something like that, but the
> actual datatype is Binary*Relational*OperatorNode, so I would have
> expected to see a variable "bron".
>
> BinaryRelationalOperatorNode beon = null;
>
> It's totally unimportant, but I saw it and thought I'd mention it.
Oops, good catch :) Just a typo, I will try to fix this up.
> 5) I looked at the updated masters; to my eye they show that the new probe
> predicate is working, and the optimizer is choosing to use index-to-base
> processing for these predicates rather than the formerly-chosen table
> scans, so this looks great to me.
Thank you for taking a look at these--this was an unexpected but very welcome
review!
> 7) For some reason, I expected to see something more vivid indicating the
> use of the new execution strategy in the query plans, I thought maybe I'd
> see something like "MultiProbeTableScanResultSet" in query dumps? Is it
> just these tests that don't show that, and other query dumps would?
This had occurred to me, as well, but then I realized that in all of the test
cases the MultiProbingTableScanResultSet shows up as the child of an
IndexToBaseRowResultSet, which I think means it doesn't actually get printed in
the query plans (only the info for the IndexToBaseRowResultSet is printed).
This is simliar, I think, to BulkTableScanResultSet, which is not printed in
query plans, either (I don't think...I believe the only way to tell is by
looking at the fetch size).
> Or is the only indication that the new probing code has been chosen the
> use of the index in place of the table scan?
Right now I think the only way to tell is to look at the use of index *and* the
number of rows visited/qualified: if we used an index but did not do
multi-probing, we will probably see far more rows visited than if we used
multi-probing (that won't always be the case, but is generally true). Perhaps
work to add an explicit indication of multi-probing to the query plan can be
handled as a separate enhancement?
These are great observations and great questions--thank you for posing them!
And if you have any others, please continue to ask. I'm definitely grateful
for the feedback...
> Some possible improvements to IN optimization
> ---------------------------------------------
>
> Key: DERBY-47
> URL: https://issues.apache.org/jira/browse/DERBY-47
> Project: Derby
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 10.0.2.0
> Environment: all
> Reporter: Sunitha Kambhampati
> Assigned To: A B
> Attachments: d47_engine_doNotCommit_v1.patch,
> d47_engine_doNotCommit_v1.stat, d47_mp_addlTestCases.patch,
> d47_mp_CBO_MoAP_v1.patch, d47_mp_CBO_MoAP_v1.stat, d47_mp_codeGen_v1.patch,
> d47_mp_codeGen_v1.stat, d47_mp_exec_v1.patch, d47_mp_exec_v1.stat,
> d47_mp_masters_v1.patch, d47_mp_preprocess_v1.patch,
> d47_mp_preprocess_v1.stat, d47_mp_relOpPredCheck_v1.patch,
> d47_mp_relOpPredCheck_v1.stat, derby-47-performance-data.txt,
> derby-47-performance-data.txt, Derby47PerformanceTest.java,
> Derby47PerformanceTest.java, InListOperatorNode.java,
> QueryPlanUniqueIndexAndWordIndexOneTerm.txt,
> QueryPlanUniqueIndexAndWordIndexTwoTerms.txt,
> QueryPlanUniqueIndexOnlyOneTerm.txt, QueryPlanUniqueIndexOnlyTwoTerms.txt,
> readlocks.diff, readlocks_withContext.diff, readlocks_withContext.diff
>
>
> Consider a simple case of -
> A table tbl has 10000 rows, there is a primary key index on i1
> and the query in question is
> select * from tbl where i1 in (-1,100000)
> derby does a table scan of the entire table even though the "IN" list has
> only two values and the comparison is on a field that has an index.
> Briefly looking at the code, it seems like we insert a between and use the IN
> list to get the start and stop values for the scan. Thus the range of the
> values in the "IN" list here plays an important role.
> Thus if the query was changed to select * from tbl where i1 in (-1, 1), an
> index scan would be chosen.
> It would be nice if we could do something clever in this case where there is
> clearly an index on the field and the number of values in the IN list is
> known. Maybe use the rowcount estimate and the IN list size to do some
> optimizations.
> - consider the length of the "IN" list to do searches on the table. ie use
> the IN list values to do index key searches on the table,
> -or try to convert it to a join. Use the "IN" list values to create a
> temporary table and do a join. It is most likely that the optimizer will
> choose the table with "IN" list here as the outer table in the join and thus
> will do key searches on the larger table.
> -------------------------------------------------------------------
> some query plans that I logged using derby.language.logQueryPlan=true for
> some similar queries:
> Table has ascending values from 0 - 9999 for i1. primary key index on i1.
> GMT Thread[UT0,5,main] (XID = 19941), (SESSIONID = 0), select * from
> scanfixed where i1 in (-1,9999,9998,9997,9996,9995,9994,9993,9992,9991,9990)
> ******* Project-Restrict ResultSet (2):
> Number of opens = 1
> Rows seen = 10000
> Rows filtered = 9990
> restriction = true
> projection = false
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 750.38
> optimizer estimated cost: 8579.46
> Source result set:
> Table Scan ResultSet for SCANFIXED at read committed isolation level
> using instantaneous share row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 10000
> Rows filtered = 0
> Fetch Size = 16
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=9
> Number of pages visited=417
> Number of rows qualified=10000
> Number of rows visited=10000
> Scan type=heap
> start position:
> null stop position:
> null qualifiers:
> Column[0][0] Id: 0
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> Column[0][1] Id: 0
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
> optimizer estimated row count: 750.38
> optimizer estimated cost: 8579.46
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> l
> 2004-10-14 18:59:47.577 GMT Thread[UT0,5,main] (XID = 19216), (SESSIONID =
> 0), select * from scanfixed where i1 in
> (9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict
> ResultSet (3):
> Number of opens = 1
> Rows seen = 10
> Rows filtered = 0
> restriction = true
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 4.80
> optimizer estimated cost: 39.53
> Source result set:
> Index Row to Base Row ResultSet for SCANFIXED:
> Number of opens = 1
> Rows seen = 10
> Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6, 7, 8}
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 4.80
> optimizer estimated cost: 39.53
> Index Scan ResultSet for SCANFIXED using index SCANFIXEDX at
> read committed isolation level using instantaneous share row locking chosen
> by the optimizer
> Number of opens = 1
> Rows seen = 10
> Rows filtered = 0
> Fetch Size = 16
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=2
> Number of deleted rows visited=0
> Number of pages visited=2
> Number of rows qualified=10
> Number of rows visited=10
> Scan type=btree
> Tree height=2
> start position:
> >= on first 1 column(s).
> Ordered null semantics on the following columns:
> stop position:
> > on first 1 column(s).
> Ordered null semantics on the following columns:
> qualifiers:
> None
> optimizer estimated row count: 4.80
> optimizer estimated cost: 39.53
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.