Careful with the timing. You may be looking at OS memory caching the result set instead of pulling from the drive. For best bets, either re-run both queries several times, ditch the longest and shortest times, then take the mean or average times and do the comparison that way.
On Wed, Oct 15, 2014 at 7:54 PM, Michael Falconer < michael.j.falco...@gmail.com> wrote: > Igor, > > Nice one. A quick test using 10 lookup trees and the same search criteria: > *Showing rows 0 - 29 (30 total, Query took 0.4838 sec)* > > Now with UNION ALL replacing UNION: > *Showing rows 0 - 29 (30 total, Query took 0.2050 sec)* > > You weren't kidding about cheaper were you? LOL > > > On 16 October 2014 10:30, Michael Falconer <michael.j.falco...@gmail.com> > wrote: > > > Yes, I agree with the general sentiment. It is not exactly the design I > > would have chosen either, but it is what exists. Design change is > occurring > > but for the moment we are stuck with the current schema. > > > > If you cannot change the schemata to be more suitable, then your > >> demonstrated queries are very near as good as it gets. I can think of > other > >> ways to achieve the same, but nothing that would be more efficient to a > >> tangible level. Maybe someone else here can come up with something a bit > >> more tight. > >> > > You are preaching to the converted I'm afraid, but I appreciate the > points > > you made. The quoted paragraph probably answers most of my questions > > really, except for this from Igor: > > > > Unless you do expect duplicates and need to eliminate them, use UNION ALL > >> - it's much cheaper (this is assuming you insist on keeping multiple > >> tables). > >> > > Cheaper because it drops the operation to remove the duplicate records > > from the result set? I'm assuming. I'll give that a try. > > > > On 16 October 2014 10:05, RSmith <rsm...@rsweb.co.za> wrote: > > > >> > >> On 2014/10/16 00:05, Michael Falconer wrote: > >> > >>> Hi all, > >>> > >>> first off I must start with an apology. I know I'm sort of doing the > >>> wrong//... > >>> > >> > >> No need to apologise, this flies quite close to the central theme. > >> Whether you are using SQLite or any other SQL RDBMS, this is horrible DB > >> design and it is so for precisely the reasons you are asking the > question. > >> Tables should not be dynamic: The simple rule of thumb being - if you > >> cannot tell (or at least accurately predict) before-hand exactly how > many > >> tables will be in the final system, then you are doing it wrong. Why > not > >> have one set of tables and in stead of preceding each of them with a > >> <prefix>, simply add a column that can host the <prefix> as a simple > value, > >> which will immediately make your life very much easier and get the SQL > >> engine to do the work you are now trying to compensate for in a > lop-sided > >> manual kind of way. > >> > >> I can only imagine the amount of code you invested into your system to > >> track and deal with these dynamic sets of tables, so I know changing it > >> will seem like a rather large undertaking, but really it will mostly > >> involve removing loads of code to end up with a small set of simple > queries > >> that does all the work for you. > >> > >> The only reason I can imagine this sort of breakdown useful is if your > >> tree tables are all really really huge, like Gigabytes, and so there > might > >> be some efficiency to be gained from splitting it up, but I doubt this > is > >> the case. > >> > >> Just imagine the ease of those search queries you demonstrated... one > >> query, no UNIONs, a single set of joins and an extra where-clause > check... > >> Add to that increased efficiency at finding results (run one query in > stead > >> of 50). Easy-mode. As a proof-of-concept, just make an SQLite DB, one > set > >> of tables as per your list, add column(s) for prefixing as needed, > import > >> all the tables from some large set of them (specifying additionally the > >> column for the prefix) and ten run search queries on them. I am willing > to > >> bet on the fact it will be much quicker in addition to being much > simpler. > >> > >> If you cannot change the schemata to be more suitable, then your > >> demonstrated queries are very near as good as it gets. I can think of > other > >> ways to achieve the same, but nothing that would be more efficient to a > >> tangible level. Maybe someone else here can come up with something a bit > >> more tight. > >> > >> > >> thing here as this question is NOT related to sqlite. It is a general > SQL > >>> question but I ask it here because I have great respect for the answers > >>> and > >>> discussions I have seen on this forum over many years. I rarely post > >>> myself > >>> as there are always several contributors who beat me to the answer and > >>> often their response is far better than mine would have been. I'm not a > >>> code leper, I don't need the actual SQL just the method really, though > a > >>> short code example would be well received for illustration. > >>> > >>> It's about a search performed on multiple tables. However the structure > >>> of > >>> this database is somewhat unorthodox. It contains genealogical data and > >>> this is clustered into trees. Each tree has 8 tables, there can be as > >>> many > >>> as 100 trees but most installations of the associated application > >>> software > >>> contain between 5 - 50 trees. These 8 tables contain a family trees > data > >>> and are named: > >>> <prefix>addresses > >>> <prefix>connections > >>> <prefix>events > >>> <prefix>family > >>> <prefix>person > >>> <prefix>repositories > >>> <prefix>sources > >>> <prefix>texts > >>> > >>> The <prefix> changes and is unique for each tree. > >>> > >>> There are extensive search options offered to users as well as simple > >>> name > >>> searching and it is one of the best features of the app. It works > pretty > >>> well, so it ain't broke in any sense, we just wonder if there is a > better > >>> way to perform this search in SQL. Is there a general technique which > is > >>> superior either in speed, efficiency or load bearing contexts? > >>> > >>> I am sure you can see one of the pitfalls here is the exponential > growth > >>> of > >>> such a search query as both total number of trees and indeed user > search > >>> criteria increase. For each criteria component, and there are quite a > >>> few, > >>> the appropriate tables must be queried and results joined. > >>> > >>> Searches return records of individuals meeting the entered search > >>> criteria > >>> so the query focuses on the person table as it's anchor, performs > >>> searches > >>> on required other tables in the tree and joins the results to the > person > >>> data. The results from each tree search are then UNION'ed to provide > the > >>> result set. Here is a contrived example of the SQL query code which > >>> should > >>> make things clearer. > >>> > >>> The user enters simple search criteria - any person with a last name > >>> containing 'mac' and a first name containing the character 'a'. This is > >>> the > >>> resulting query (generated by php code), which searches a small 4 > family > >>> tree installation. > >>> > >>> (SELECT humo1_person.*, event_kind, event_event, address_place, > >>> address_zip > >>> FROM humo1_person > >>> LEFT JOIN humo1_events ON pers_gedcomnumber=event_person_id > >>> LEFT JOIN humo1_addresses ON pers_gedcomnumber=address_person_id > >>> WHERE pers_lastname LIKE '%mac%' > >>> AND (pers_firstname LIKE '%a%' > >>> OR (event_kind='name' AND event_event LIKE '%a%') > >>> ) > >>> GROUP BY pers_gedcomnumber > >>> ) > >>> UNION > >>> (SELECT humo2_person.*, event_kind, event_event, address_place, > >>> address_zip > >>> FROM humo2_person LEFT JOIN humo2_events ON > >>> pers_gedcomnumber=event_person_id > >>> LEFT JOIN humo2_addresses ON pers_gedcomnumber=address_ > >>> person_id > >>> WHERE pers_lastname LIKE '%mac%' > >>> AND (pers_firstname LIKE '%a%' > >>> OR (event_kind='name' AND event_event LIKE '%a%') ) > >>> GROUP BY pers_gedcomnumber > >>> ) > >>> UNION > >>> (SELECT humo3_person.*, event_kind, event_event, address_place, > >>> address_zip > >>> FROM humo3_person LEFT JOIN humo3_events ON > >>> pers_gedcomnumber=event_person_id > >>> LEFT JOIN humo3_addresses ON pers_gedcomnumber=address_ > >>> person_id > >>> WHERE pers_lastname LIKE '%mac%' > >>> AND (pers_firstname LIKE '%a%' > >>> OR (event_kind='name' AND event_event LIKE '%a%') > >>> ) > >>> GROUP BY pers_gedcomnumber > >>> ) > >>> UNION > >>> (SELECT humo4_person.*, event_kind, event_event, address_place, > >>> address_zip > >>> FROM humo4_person > >>> LEFT JOIN humo4_events ON pers_gedcomnumber=event_person_id > >>> LEFT JOIN humo4_addresses ON pers_gedcomnumber=address_ > >>> person_id > >>> WHERE pers_lastname LIKE '%mac%' > >>> AND (pers_firstname LIKE '%a%' > >>> OR (event_kind='name' AND event_event LIKE '%a%') > >>> ) > >>> GROUP BY pers_gedcomnumber > >>> ) > >>> ORDER BY pers_lastname ASC , pers_firstname ASC LIMIT 0,30 > >>> > >>> Any thoughts? Suggestions? Missiles? Good approach, bad approach, or > >>> completely off the grid? I do use sqlite quite a bit, but not on this > >>> particular project. > >>> > >>> > >>> > >> _______________________________________________ > >> sqlite-users mailing list > >> sqlite-users@sqlite.org > >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > >> > > > > > > > > -- > > Regards, > > Michael.j.Falconer. > > > > > > -- > Regards, > Michael.j.Falconer. > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users