Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
10.07.2017 14:20, Paul Reeves wrote: I can understand that this plan might appear to be invalid from the perspective of the optimiser. But surely the whole point of adding the PLAN clause is because I think I know better than the optimiser what I want. Unless the engine physically cannot execute your plan. This was exactly the case. So to return to the original query: where e.EMP_NO = COALESCE( ?, e.emp_no ) surely I ought to be able to add PLAN (E INDEX (RDB$PRIMARY7)) Then if the parameter resolves to a value at run-time the index will be used. Obviously if the parameter is null performance would be much worse than a natural scan because the engine must walk the index pages and then walk the data pages. But the customer is always right, surely :-) You seem to believe that the engine can execute the INDEX plan without lower/upper bounds (which require e.emp_no to be known in advance). In theory, it could. In practice, it's pointless and thus prohibited by the optimizer. Why not create a second stream in that case? that seems to be what happens when I create this query: SELECT * FROM employee WHERE EMP_NO = 2 OR EMP_NO is null this plan is chosen... PLAN (EMPLOYEE INDEX (RDB$PRIMARY7, RDB$PRIMARY7)) This is still a single stream and bitmap based on two index scans. Solution with two streams (chosen conditionally at runtime) is implemented in FB3 but only for some specific syntax. Perhaps your COALESCE trick could join this special category, but it doesn't yet. Because there are two streams the fetches and the indexed reads more or less double because of the OR clause. (Again, a subject for another day is why does the optimiser even bother with the second stream when EMP_NO can never be null ? ) The optimizer don't take NOT NULL constraints into account. They don't have existence locks and thus can affect query results if dropped in the meantime. Dmitry -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
On Mon, 10 Jul 2017 12:31:43 +0300 Dmitry Yemanov wrote > > > The other question I asked was... > > > > why can't I force the plan... > > > >PLAN (E INDEX (RDB$PRIMARY7)) > > Because it's invalid in this case. > I can understand that this plan might appear to be invalid from the perspective of the optimiser. But surely the whole point of adding the PLAN clause is because I think I know better than the optimiser what I want. So to return to the original query: where e.EMP_NO = COALESCE( ?, e.emp_no ) surely I ought to be able to add PLAN (E INDEX (RDB$PRIMARY7)) Then if the parameter resolves to a value at run-time the index will be used. Obviously if the parameter is null performance would be much worse than a natural scan because the engine must walk the index pages and then walk the data pages. But the customer is always right, surely :-) ( I also understand there is a downside to the customer always being right argument - one bad customer could screw everything up for all the others. But is that argument being used here or is it just because of the way the optimiser works? ) > > What are the rules for deciding when a PLAN statement will be > > rejected? > >>From one logical point of view if have this sort of query > > > >select * from mytable m where afield = whatever; > > > > and I add > > > >plan (m index(myindex )) > > > > the engine should accept that. Shouldn't it? > > It depends on "whatever". If it's literal or independent expression > or priorly evaluated field, index scan can be used. If it includes a > field from the same stream, it cannot. > Why not create a second stream in that case? that seems to be what happens when I create this query: SELECT * FROM employee WHERE EMP_NO = 2 OR EMP_NO is null this plan is chosen... PLAN (EMPLOYEE INDEX (RDB$PRIMARY7, RDB$PRIMARY7)) Because there are two streams the fetches and the indexed reads more or less double because of the OR clause. (Again, a subject for another day is why does the optimiser even bother with the second stream when EMP_NO can never be null ? ) Anyway, none of the above is any sort of criticism. I'm just trying to get a better understanding of the rules and maybe find out where things are broken that could be fixed easily (or hacked around). Paul -- Paul Reeves http://www.ibphoenix.com Supporting users of Firebird -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
10.07.2017 12:17, Paul Reeves пишет: On Fri, 7 Jul 2017 18:07:55 +0300 Dmitry Yemanov wrote 07.07.2017 17:51, Paul Reeves wrote: I understand that evaluating COALESCE(?, e.emp_no ) at prepare time may require a circular logic and is thus impractical It cannot be done at runtime either (without deducting that e.emp_no is the same in both paths of the condition). but surely the more important piece of information is in the where e.emp_no = A unique index exists on this column so surely the optimiser should choose it instead of a natural scan ? No, it cannot. Imagine "where T.A = T.B", it cannot use an index for either A or B. Only full table scan is possible. The other question I asked was... why can't I force the plan... PLAN (E INDEX (RDB$PRIMARY7)) Because it's invalid in this case. What are the rules for deciding when a PLAN statement will be rejected? From one logical point of view if have this sort of query select * from mytable m where afield = whatever; and I add plan (m index(myindex )) the engine should accept that. Shouldn't it? It depends on "whatever". If it's literal or independent expression or priorly evaluated field, index scan can be used. If it includes a field from the same stream, it cannot. Dmitry -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
On Fri, 7 Jul 2017 18:07:55 +0300 Dmitry Yemanov wrote > 07.07.2017 17:51, Paul Reeves wrote: > > > But that doesn't answer all my questions... > > > > Given > > > >where e.EMP_NO = COALESCE(?, e.emp_no ) > > > > and that there is an index on EMP_NO, why doesn't the optimiser > > default to the index. After all, it is logically more likely that a > > value will be passed in the where condition, rather than a NULL. > > To evaluate COALESCE, e.emp_no must be known. How it can be known > before we start reading the table (via index scan)? Chicken and egg > problem. > I understand that evaluating COALESCE(?, e.emp_no ) at prepare time may require a circular logic and is thus impractical but surely the more important piece of information is in the where e.emp_no = A unique index exists on this column so surely the optimiser should choose it instead of a natural scan ? The other question I asked was... why can't I force the plan... PLAN (E INDEX (RDB$PRIMARY7)) This error is raised... 'cannot be used in the specified plan' And a supplementary question (perhaps it merits a separate thread ) What are the rules for deciding when a PLAN statement will be rejected? >From one logical point of view if have this sort of query select * from mytable m where afield = whatever; and I add plan (m index(myindex )) the engine should accept that. Shouldn't it? Paul -- Paul Reeves http://www.ibphoenix.com Supporting users of Firebird -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
07.07.2017 18:26, Adriano dos Santos Fernandes wrote: BTW, isn't ConditionalStream used for something in this field? Yep, but the optimizer so far handles just one specific case (A = ? or ? is null). It could be extended though. Dmitry -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
> To evaluate COALESCE, e.emp_no must be known. Why? Is it not really the case that for all practical purposes the COALESCE will always return a value How else could COALESCE(?, NULL ) [Example #3] use an index? {Nothing says that the input parameter won't be NULL} Sean -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
On 07/07/2017 12:19, Dmitry Yemanov wrote: > 07.07.2017 18:12, Dimitry Sibiryakov wrote: >> >> In this particular case it is enough to know parameter value to >> choose plan. Parameters are known before reading table. > > True, but our optimizer is developed for generic cases, not such > specific ones. It could be improved, but I'd say we have more > important issues to solve there. > BTW, isn't ConditionalStream used for something in this field? Adriano -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
07.07.2017 18:12, Dimitry Sibiryakov wrote: In this particular case it is enough to know parameter value to choose plan. Parameters are known before reading table. True, but our optimizer is developed for generic cases, not such specific ones. It could be improved, but I'd say we have more important issues to solve there. Dmitry -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
07.07.2017 17:07, Dmitry Yemanov wrote: Given where e.EMP_NO = COALESCE(?, e.emp_no ) and that there is an index on EMP_NO, why doesn't the optimiser default to the index. After all, it is logically more likely that a value will be passed in the where condition, rather than a NULL. To evaluate COALESCE, e.emp_no must be known. How it can be known before we start reading the table (via index scan)? Chicken and egg problem. In this particular case it is enough to know parameter value to choose plan. Parameters are known before reading table. -- WBR, SD. PS: Is it only my Thunderbird has started to send answers to authors instead of list? -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
07.07.2017 17:51, Paul Reeves wrote: But that doesn't answer all my questions... Given where e.EMP_NO = COALESCE(?, e.emp_no ) and that there is an index on EMP_NO, why doesn't the optimiser default to the index. After all, it is logically more likely that a value will be passed in the where condition, rather than a NULL. To evaluate COALESCE, e.emp_no must be known. How it can be known before we start reading the table (via index scan)? Chicken and egg problem. Dmitry -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
On Fri, 7 Jul 2017 11:27:26 -0300 Adriano dos Santos Fernandes wrote > > > It does not evaluate COALESCE at prepare time. > > It's just a expression which may or may not contain fields. > > If it doesn't contain fields, it will be the same as a simple "?" for > the plan calculation purposes. > > OK. That is not the answer I was hoping for :-) But that doesn't answer all my questions... Given where e.EMP_NO = COALESCE(?, e.emp_no ) and that there is an index on EMP_NO, why doesn't the optimiser default to the index. After all, it is logically more likely that a value will be passed in the where condition, rather than a NULL. In fact, considering the index is a primary key where e.EMP_NO = NULL would make no sense at all. Surely the logic ought to be if unique index on field then use index Also, even though a field is specified in the coalesce why can't I force the plan... PLAN (E INDEX (RDB$PRIMARY7)) This error is raised... 'cannot be used in the specified plan' Paul -- Paul Reeves http://www.ibphoenix.com Supporting users of Firebird -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Re: [Firebird-devel] Optimisation of where afield = coalesce(value1, value2 )
On 07/07/2017 11:19, Paul Reeves wrote: > Given this simple SQL statement > > select e.emp_no, e.full_name > from employee e > where e.emp_no = coalesce(value1,value2 ) > > the plan varies depending on the following : > > 1. where e.EMP_NO = COALESCE(?, ? ) -- PLAN (E INDEX (RDB$PRIMARY7)) > > 2. where e.EMP_NO = COALESCE(2, ? ) -- PLAN (E INDEX (RDB$PRIMARY7)) > > 3. where e.EMP_NO = COALESCE(?, NULL ) -- PLAN (E INDEX (RDB$PRIMARY7)) > > 4. where e.EMP_NO = COALESCE(2, NULL ) -- PLAN (E INDEX (RDB$PRIMARY7)) > > 5. where e.EMP_NO = COALESCE(NULL, 2 ) -- PLAN (E INDEX (RDB$PRIMARY7)) > > 6. where e.EMP_NO = COALESCE(NULL, e.emp_no ) -- PLAN (E NATURAL) > > 7. where e.EMP_NO = COALESCE(2, e.emp_no ) -- PLAN (E NATURAL) > > 8. where e.EMP_NO = COALESCE(?, e.emp_no ) -- PLAN (E NATURAL) > > ... > And examples 2, and 4 seem to prove that the optimiser is capable of > evaluating the COALESCE during prepare and choosing an index. ... > However we have seen in > examples 1 and 3 that it is equally capable of choosing an index when > evaluating the coalesce even if the final evaluation may be null. > > And so here are the questions > > - is the plan produced in example 8 wrong or right ? > - the above examples seem to lack coherence so how much is COALESCE > evaluated during a prepare ? > - should the optimiser assume that the first parameter will > always resolve to a value > - how does the optimiser consider COALESCE when this construct is used > in a stored procedure? > > It does not evaluate COALESCE at prepare time. It's just a expression which may or may not contain fields. If it doesn't contain fields, it will be the same as a simple "?" for the plan calculation purposes. Adriano -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel