[ 
https://issues.apache.org/jira/browse/DERBY-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250860#comment-14250860
 ] 

Mike Matrigali commented on DERBY-6784:
---------------------------------------

In Army's original work there is an interesting note that applies directly to 
the class of queries I would like to 
improve with work on this issue:

 2. Not surprisingly, the costing logic for probing as implemented in DERBY-47 
is not perfect. It works great in situations where the IN list is significantly 
smaller than the number of rows in the table--ex. we see good results for 100k 
rows. However, I discovered that if we just run with 10,000 rows, then once we 
hit 1,000 values in the IN list the costing of probe predicates causes the 
optimizer to think that it would be too expensive, so it (the optimizer) ends 
up doing a table scan. In truth the table scan is still far slower than index 
probing, but the relative size of the IN list with respect to the number of 
rows in the table throws the costing off. To confirm this I just removed the 
probing cost logic (so that it effectively becomes the cost of a single "col = 
?" predicate) and then the optimizer chose to do index probing for the 10,000 
row scenario, which was much, much faster (as expected).

My feeling is that any work related to investigation/resolution of these two 
issues can and should be completed as part of a new Jira...

> change optimizer to choose in list multiprobe more often
> --------------------------------------------------------
>
>                 Key: DERBY-6784
>                 URL: https://issues.apache.org/jira/browse/DERBY-6784
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.11.1.1
>            Reporter: Mike Matrigali
>
> Using the  multi-probe join strategy is an obvious performance win when
> the optimizer chooses it.  There are cases currently where the costing 
> makes the optimizer choose other plans which do not perform as well as
> the multi-probe strategy.
> The class of queries that are affected are those where the number of terms
> in the IN LIST is large relative to the number of rows in the table, and there
> is a useful index to probe for the column that is referenced by the IN LIST.
> There are multiple benefits to choosing the multi-probe strategy, including
> the following:
> 1) often better execution time, where the alternative is to do a full table 
>     merge on the column.
> 2) The multi-probe strategy results in "pushing" the work into the store, 
>      and this may result in more concurrent behavior (see DERBY-6300 and 
> DERBY-6301).   First less rows may
>      be locked by probing rather than full table scan (and in the worst case
>      same number if query manages to probe on every value in table).  
>      Second depending on isolation level of the query store will only matching
>      rows, while in the current implementation all rows that are returned by
>      store for qualification above store will remain locked whether they 
>      qualify or not.   Especially in small table cases other query plan 
> choices
>      have been changed to favor probing indexes rather than full table scans
>      even if pure cpu is better with table scan.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to