Re: [sqlite] Search query alternatives.

Stephen Chrzanowski Wed, 15 Oct 2014 17:15:12 -0700

Careful with the timing.  You may be looking at OS memory caching the
result set instead of pulling from the drive.  For best bets, either re-run
both queries several times, ditch the longest and shortest times, then take
the mean or average times and do the comparison that way.


On Wed, Oct 15, 2014 at 7:54 PM, Michael Falconer <
[email protected]> wrote:

> Igor,
>
> Nice one. A quick test using 10 lookup trees and the same search criteria:
> *Showing rows 0 - 29 (30 total, Query took 0.4838 sec)*
>
> Now with UNION ALL replacing UNION:
> *Showing rows 0 - 29 (30 total, Query took 0.2050 sec)*
>
> You weren't kidding about cheaper were you? LOL
>
>
> On 16 October 2014 10:30, Michael Falconer <[email protected]>
> wrote:
>
> > Yes, I agree with the general sentiment. It is not exactly the design I
> > would have chosen either, but it is what exists. Design change is
> occurring
> > but for the moment we are stuck with the current schema.
> >
> > If you cannot change the schemata to be more suitable, then your
> >> demonstrated queries are very near as good as it gets. I can think of
> other
> >> ways to achieve the same, but nothing that would be more efficient to a
> >> tangible level. Maybe someone else here can come up with something a bit
> >> more tight.
> >>
> > You are preaching to the converted I'm afraid, but I appreciate the
> points
> > you made. The quoted paragraph probably answers most of my questions
> > really, except for this from Igor:
> >
> > Unless you do expect duplicates and need to eliminate them, use UNION ALL
> >> - it's much cheaper (this is assuming you insist on keeping multiple
> >> tables).
> >>
> > Cheaper because it drops the operation to remove the duplicate records
> > from the result set? I'm assuming. I'll give that a try.
> >
> > On 16 October 2014 10:05, RSmith <[email protected]> wrote:
> >
> >>
> >> On 2014/10/16 00:05, Michael Falconer wrote:
> >>
> >>> Hi all,
> >>>
> >>> first off I must start with an apology. I know I'm sort of doing the
> >>> wrong//...
> >>>
> >>
> >> No need to apologise, this flies quite close to the central theme.
> >> Whether you are using SQLite or any other SQL RDBMS, this is horrible DB
> >> design and it is so for precisely the reasons you are asking the
> question.
> >> Tables should not be dynamic: The simple rule of thumb being - if you
> >> cannot tell (or at least accurately predict) before-hand exactly how
> many
> >> tables will be in the final system, then you are doing it wrong.  Why
> not
> >> have one set of tables and in stead of preceding each of them with a
> >> <prefix>, simply add a column that can host the <prefix> as a simple
> value,
> >> which will immediately make your life very much easier and get the SQL
> >> engine to do the work you are now trying to compensate for in a
> lop-sided
> >> manual kind of way.
> >>
> >> I can only imagine the amount of code you invested into your system to
> >> track and deal with these dynamic sets of tables, so I know changing it
> >> will seem like a rather large undertaking, but really it will mostly
> >> involve removing loads of code to end up with a small set of simple
> queries
> >> that does all the work for you.
> >>
> >> The only reason I can imagine this sort of breakdown useful is if your
> >> tree tables are all really really huge, like Gigabytes, and so there
> might
> >> be some efficiency to be gained from splitting it up, but I doubt this
> is
> >> the case.
> >>
> >> Just imagine the ease of those search queries you demonstrated... one
> >> query, no UNIONs, a single set of joins and an extra where-clause
> check...
> >> Add to that increased efficiency at finding results (run one query in
> stead
> >> of 50).  Easy-mode. As a proof-of-concept, just make an SQLite DB, one
> set
> >> of tables as per your list, add column(s) for prefixing as needed,
> import
> >> all the tables from some large set of them (specifying additionally the
> >> column for the prefix) and ten run search queries on them. I am willing
> to
> >> bet on the fact it will be much quicker in addition to being much
> simpler.
> >>
> >> If you cannot change the schemata to be more suitable, then your
> >> demonstrated queries are very near as good as it gets. I can think of
> other
> >> ways to achieve the same, but nothing that would be more efficient to a
> >> tangible level. Maybe someone else here can come up with something a bit
> >> more tight.
> >>
> >>
> >>  thing here as this question is NOT related to sqlite. It is a general
> SQL
> >>> question but I ask it here because I have great respect for the answers
> >>> and
> >>> discussions I have seen on this forum over many years. I rarely post
> >>> myself
> >>> as there are always several contributors who beat me to the answer and
> >>> often their response is far better than mine would have been. I'm not a
> >>> code leper, I don't need the actual SQL just the method really, though
> a
> >>> short code example would be well received for illustration.
> >>>
> >>> It's about a search performed on multiple tables. However the structure
> >>> of
> >>> this database is somewhat unorthodox. It contains genealogical data and
> >>> this is clustered into trees. Each tree has 8 tables, there can be as
> >>> many
> >>> as 100 trees but most installations of the associated application
> >>> software
> >>> contain between 5 - 50 trees. These 8 tables contain a family trees
> data
> >>> and are named:
> >>> <prefix>addresses
> >>> <prefix>connections
> >>> <prefix>events
> >>> <prefix>family
> >>> <prefix>person
> >>> <prefix>repositories
> >>> <prefix>sources
> >>> <prefix>texts
> >>>
> >>> The <prefix> changes and is unique for each tree.
> >>>
> >>> There are extensive search options offered to users as well as simple
> >>> name
> >>> searching and it is one of the best features of the app. It works
> pretty
> >>> well, so it ain't broke in any sense, we just wonder if there is a
> better
> >>> way to perform this search in SQL. Is there a general technique which
> is
> >>> superior either in speed, efficiency or load bearing contexts?
> >>>
> >>> I am sure you can see one of the pitfalls here is the exponential
> growth
> >>> of
> >>> such a search query as both total number of trees and indeed user
> search
> >>> criteria increase. For each criteria component, and there are quite a
> >>> few,
> >>> the appropriate tables must be queried and results joined.
> >>>
> >>> Searches return records of individuals meeting the entered search
> >>> criteria
> >>> so the query focuses on the person table as it's anchor, performs
> >>> searches
> >>> on required other tables in the tree and joins the results to the
> person
> >>> data. The results from each tree search are then UNION'ed to provide
> the
> >>> result set. Here is a contrived example of the SQL query code which
> >>> should
> >>> make things clearer.
> >>>
> >>> The user enters simple search criteria - any person with a last name
> >>> containing 'mac' and a first name containing the character 'a'. This is
> >>> the
> >>> resulting query (generated by php code), which searches a small 4
> family
> >>> tree installation.
> >>>
> >>> (SELECT humo1_person.*, event_kind, event_event, address_place,
> >>> address_zip
> >>>      FROM humo1_person
> >>>      LEFT JOIN humo1_events ON pers_gedcomnumber=event_person_id
> >>>      LEFT JOIN humo1_addresses ON pers_gedcomnumber=address_person_id
> >>>          WHERE pers_lastname LIKE '%mac%'
> >>>            AND (pers_firstname LIKE '%a%'
> >>>                OR (event_kind='name' AND event_event LIKE '%a%')
> >>>            )
> >>>      GROUP BY pers_gedcomnumber
> >>> )
> >>> UNION
> >>>      (SELECT humo2_person.*, event_kind, event_event, address_place,
> >>> address_zip
> >>>          FROM humo2_person LEFT JOIN humo2_events ON
> >>> pers_gedcomnumber=event_person_id
> >>>          LEFT JOIN humo2_addresses ON pers_gedcomnumber=address_
> >>> person_id
> >>>              WHERE pers_lastname LIKE '%mac%'
> >>>                AND (pers_firstname LIKE '%a%'
> >>>                    OR (event_kind='name' AND event_event LIKE '%a%') )
> >>>          GROUP BY pers_gedcomnumber
> >>>      )
> >>> UNION
> >>>      (SELECT humo3_person.*, event_kind, event_event, address_place,
> >>> address_zip
> >>>          FROM humo3_person LEFT JOIN humo3_events ON
> >>> pers_gedcomnumber=event_person_id
> >>>          LEFT JOIN humo3_addresses ON pers_gedcomnumber=address_
> >>> person_id
> >>>              WHERE pers_lastname LIKE '%mac%'
> >>>                AND (pers_firstname LIKE '%a%'
> >>>                    OR (event_kind='name' AND event_event LIKE '%a%')
> >>>                )
> >>>          GROUP BY pers_gedcomnumber
> >>>      )
> >>> UNION
> >>>      (SELECT humo4_person.*, event_kind, event_event, address_place,
> >>> address_zip
> >>>          FROM humo4_person
> >>>          LEFT JOIN humo4_events ON pers_gedcomnumber=event_person_id
> >>>          LEFT JOIN humo4_addresses ON pers_gedcomnumber=address_
> >>> person_id
> >>>              WHERE pers_lastname LIKE '%mac%'
> >>>                AND (pers_firstname LIKE '%a%'
> >>>                    OR (event_kind='name' AND event_event LIKE '%a%')
> >>>                )
> >>>          GROUP BY pers_gedcomnumber
> >>>      )
> >>>      ORDER BY pers_lastname ASC , pers_firstname ASC LIMIT 0,30
> >>>
> >>> Any thoughts? Suggestions? Missiles? Good approach, bad approach, or
> >>> completely off the grid? I do use sqlite quite a bit, but not on this
> >>> particular project.
> >>>
> >>>
> >>>
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [email protected]
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >>
> >
> >
> >
> > --
> > Regards,
> >      Michael.j.Falconer.
> >
>
>
>
> --
> Regards,
>      Michael.j.Falconer.
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Search query alternatives.

Reply via email to