> On Aug 11, 2015, at 1:30 PM, 'Louis van Alphen' lo...@nucleo.co.za 
> [firebird-support] <firebird-support@yahoogroups.com> wrote:
> 
> Thanks Ann, so maybe the 2 tools' metric are different. So maybe FBTM
> includes all (low level) reads and IP only user table reads.

Or maybe one runs some queries before the one it measures so all the system 
table queries are compiled, optimized, and executed, filling the cache with 
good stuff.
> 
> Also, if I remove the order by, IP reports a drastic reduction is
> execution time i.e. down to 6msec. Does the ORDER BY on the PK make such a
> difference?
>> 

Unh, maybe.   Between the list, my mail system and yours, I've lost most of the 
formatting and can't look at the original while typing (grrr....)   You seem to 
use left outer joins and that can be a problem because (in general) the order 
in which outer joins are declared in the query is the order in which they have 
to be accessed in the query plan.   

For example, a three way join of students, registrations, and courses with 
inner joins can be run starting with any of the three tables and get the same 
results.  

Select s.student_name, c.course_name 
    from students s 
         join registrations r on r.student_id = s.student_id
         join courses c on c.course_id = r.course_id

The optimizer can choose to start with registrations and lookup students and 
courses by primary key, or students and look up a registration by student_id 
and a course by the course_id in registrations, or courses -> registrations -> 
students.  Or sort the students and registrations and merge them, looking up 
courses from the matching pairs.  Or other things, probably.

However, this query can be executed in only one way: students -> registrations 
-> courses

Select s.student_name, c.course_name 
    from students s 
        left outer join registrations r on r.student_id = s.student_id
         left outer join courses c on c.course_id = r.course_id

If that way isn't optimal, too bad.  The query must be executed that way or 
you'll miss all the students who aren't registered for courses and all the 
registrations that don't correspond to any courses.

It may be that adding the sort, you've convinced Firebird that it has to read 
and sort all the records in that table, then start from the sorted result.  Try 
using one of the various syntaxes that tells it to return only the first record 
found that matches the criteria.  That may cause Firebird to walk the index ... 
read records in index order ...  rather than reading the table and sorting it.

Now that probably sounds really dumb, but in fact, walking a table in index 
order is generally a lot slower than reading it in physical order and sorting 
the result.  Yes, sort is n(log n), but there's a big K applied that is the 
cost of a random record read.  So, if you expect a lot of results, sort first 
and ask for matches later....   If not, use 'Select FIRST <n>' or 'LIMIT <n>' 
or whatever the standards committee chose to bless ... maybe  'FETCH FIRST 
<n>'.  Any of those hints will convince Firebird to walk the table in index 
order.

This query is likely to read and sort the students first, then start looking 
for the rest of the data unless there's a good index on student_name.  

Select s.student_name, c.course_name 
    from students s 
        left outer join registrations r on r.student_id = s.student_id
         left outer join courses c on c.course_id = r.course_id
     where s.student_name = 'Dinah MacQuarie'
     order by s.student_id

This may give a faster result in the absence of an student_name index.  It may 
not, but do consider trying it when playing with query forms...

Select first 30 s.student_name, c.course_name 
    from students s 
        left outer join registrations r on r.student_id = s.student_id
         left outer join courses c on c.course_id = r.course_id
     where s.student_name = 'Dinah MacQuarie'
     order by s.student_id

It's worth noting that Firebird does not store records in primay key order 
unless the records happen to arrive in primary key order.  Even when records 
are created in primary key order - as with a generated key - deleted records 
may be replaced with records created much later.   Lots of databases cluster 
records on primary key, making it unnecessary to sort when reading the primary 
key.   That has other costs, particularly with natural keys...  This post is 
already way to long to take on that issue.


Now, just to complicate things, it's possible to turn left outer joins to inner 
joins - at least if you're a person.   Maybe the Firebird optimizer knows about 
this, and maybe it doesn't.  Certainly in the previous century one database 
developer - initials JAS - thought that if you were dumb enough to use outer 
joins when you shouldn't, the optimizer shouldn't reward your stupidity.   He's 
become more benevolent in his old age.  Anyway, here's the case:

Select s.student_name, c.course_name 
    from students s 
         join registrations r on r.student_id = s.student_id
         join courses c on c.course_id = r.course_id
     where c.department_id = (select d.dpeartment_id from departments d
                                                    where d.department_name = 
'Computer Science')

That means you're going to throw out all the students without registrations, 
all of the registrations without courses, and all of the courses that aren't in 
Computer Science.  So it devolves back to inner joins and can be optimized in 
the more or less rational order courses -> registrations -> students.  Unless 
you've got a university with tens of students and hundreds of CS courses...

So to add to my confusion, I don't know if your outer joins have been corrected 
to be inner joins based on the criteria in the query on the view.

> My other question is why is there 'WARNING' against some of the indexes? Are
> the stats out of spec?

I've deleted the index stats because the combination of mail programs has left 
them unintelligible garbage - look back in this thread for a formatted version. 
  Basically the only index that doesn't have a warning is unique and has a 
selectivity of 3.nn e07.  There's another that's 3.nn e06, which is pretty good 
- but IIRC (did I mention hating my mail program?) has a very long duplicate 
chain.  Look a the numbers of unique values, the numbers of duplicates in 
general, and the largest number of duplicates for any one value.

Again, this problem is complicated by reality.  

Lets assume your students can be alive or dead.  Most of the students are 
alive.  An index on student.death_state would be useless for looking for live 
students, but very useful for looking for dead students.  If you spend a lot of 
time looking for the courses taken by dead students, the index is valuable.  If 
not, not.  So an index with lots of duplicates on one value is useful, if 
you're looking for other values.   

Similarly, an index with a fair number of duplicates on each value can be 
useful - think of student birth dates.  Most of them are going to be between 
1992 and 1997 so a large university will have lots of duplicates, but the index 
is great for finding the young and the old, and OK for figuring out who can 
drink legally in Wyoming.

The index analyzer has no clue about your patterns of data usage.  You do.  
Look at the statistics and decide if there are indexes that don't generally 
eliminate 90% of the records in the table when used for an equality retrieval.

>> 
>> I am not understanding what is going on.

I doubt that I have helped.  However, let me summarize.  

Don't sort things if you don't care about the order of retrieval.  If you do, 
and you can anticipate the number of rows that should be returned within an 
order of magnitude, limit the number of records returned.

Don't use an outer join unless you need to preserve unmatched records.

Do consider the distribution of duplicates in your indexes vs. the criteria in 
critical queries.

Good luck,


Ann
  • RE: [firebird-sup... 'Louis van Alphen' lo...@nucleo.co.za [firebird-support]
    • Re: [firebir... Ann Harrison aharri...@ibphoenix.com [firebird-support]
      • RE: [fir... 'Louis van Alphen' lo...@nucleo.co.za [firebird-support]
        • Re: ... Ann Harrison aharri...@ibphoenix.com [firebird-support]
          • ... setysvar setys...@gmail.com [firebird-support]
            • ... 'Louis van Alphen' lo...@nucleo.co.za [firebird-support]
          • ... 'Louis van Alphen' lo...@nucleo.co.za [firebird-support]

Reply via email to