On Sat, 7 Jan 2017 17:54:20 -0500
Gan Uesli Starling <g...@starling.us> wrote:

> As for the suggestion to 'normalize' my four tables to one ... yes, I 
> could  do that ... but it would make for a LOT of redundant data 
> duplicated in plural columns, which seems to me in my doubtless
> ignorant newbie status, as missing the point for a database. 

Hello Gan! 

It's been 15 years and counting.  Funny our paths should cross here.  

Regarding your table design, I wouldn't necessarly combine your table
regions, but I would change how you define your aggregations.  

I noticed in your SQL the comment "SQL doesn't support arrays".
Understanding *why* there are no arrays in SQL is a step in
understanding how to design and use SQL databases.  In SQL generally --
and in the relational model, strictly -- all data are dealt with as
*sets*.  Not lists, not arrays.  Set elements have no defined order.
That is, unlike an array, all attributes are explicitly expressed as
data.  No meaning attaches to the fact that an element is the Nth
element of the set. (You doubtless at some time have come across the
problem of dealing with an array element and not knowing where it stood
in the array.  In SQL, that information must be explicit in order to
exist!)  

The advantage of dealing with a single datatype is manifested in the
simplicity of the relational operators.  By virtue of adding just one
column, arrays become tables and are, yes, supported.  To a one,
flavors of SQL with array support do so with limitations, even though
they *only* add complexity.  There is no array operation that cannot be
expressed in SQL; ergo there is is no need for array notation in SQL.  

Once you think of your arrays as vertical instead of horizontal, if you
will, your tables become easier to design and manipulate.  For example,
instead of 

CREATE TABLE world (
        iso CHAR(2) PRIMARY KEY,
    un CHAR(3),
    country VARCHAR(45),
        count SMALLINT DEFAULT 0, /* Total Count, needs daily update */
        _160m SMALLINT DEFAULT 0, /* Band Count, need daily update for
each */ _80m SMALLINT DEFAULT 0,
        _60m SMALLINT DEFAULT 0,
        ...

I would suggest two tables: 

CREATE TABLE world (
        iso CHAR(2) PRIMARY KEY,
    un CHAR(3),
    country VARCHAR(45)
);

Create Table Aggr ( 
        iso CHAR(2) not NULL foreign key references world(iso),         
        size int not NULL, 
        total int not NULL,  -- not "count" because keyword
        primary key (iso, size)
);

Now, it's not clear to me that the Aggr table is well advised.  I think
it might better be a view.  Definitely I would define it as a view and
try updating the table from that view.  If you're frequently
referencing whole chunks of Aggr, it might make sense to maintain the
table.  If you're usually referencing only a few rows of it, it
probably makes more sense to keep just the view.  

In any case, updating the above table is simpler and almost certainly
faster.  Instead of one update per column (which IIUC is the genesis of
this thread), 

UPDATE world SET _160m = (SELECT count from country_160m WHERE country
= world.country) WHERE country IN (SELECT country FROM country_160m);

use one update per table.  First, make a tiny static table Bands of the
interesting bands you want (160, 80, etc.).  Then: 

update Aggr set total = (
        select count(*) from "something" 
                                -- not sure what the real source is
        where size = Bands.size
        and iso = Aggr.iso
        and size = Aggr.size
) 
where exists (
        select 1 from "something"
        where iso = Aggr.iso
        and size = Aggr.size
);

That updates all bands for all countries in one swell foop.  

If you prefer to see bands-by-country, as in your world table, create a
view that uses a technique known as "folding".  It's verbose (as SQL is
wont) but it can be surprisingly fast.  But don't be two hasty: except
for reports, most queries against the Aggr table won't need more than a
view rows.  It is more efficient and as easy to use "world" would be.
Instead of, 

        select ... from world where _160m > 7;

you have 

        select ... from Aggr where size = 160 and total > 7;

The SQL becomes more regular because you don't have to write different
queries depending on which size (band) you're interested in.  The
execution is faster because there's less I/O: the row is narrower, and
it's found by a binary search on the index supporting the primary key.
By contrast, your "world" table would need an index on every _nnnm
column which, besides being tedious to define, would increase the cost
of your inserts.  

One last word on aggregation and views.  

Many programmers seem to come to SQL with the instinct of pre-computing
their aggregates.  Most programming languages have no support for
computation with sets, no built-in support for aggregation and
quantification.  Furthermore, because most applications deal with a
static snapshot of the data, the instinct to laboriously massage the
input into ever finer form is continually rewarded.  

SQL makes aggregation and quantification easy, and imposes an I/O cost
on pre-computed aggregations.  The input is volatile, and any
aggregation is only as good as the last update.  Those features often
combine to discourage pre-computing.  

We see many questions here about using triggers or somesuch to maintain
aggregations.  Often the first question and best answer is, why not use
a view?  

I resorted to using a table of aggregations on occasion, but only
after writing the view and optimizing support for it.  Truth be told,
even on those occasions the tables really shouldn't have been
necessary, but the convenience of the views encouraged other users to
write other views atop them -- turtles all the way up! -- to the point
that the DBMS threw in the towel and gave the query optimizer the day
off.  Having a small budget for towels, my only option was to
"tabe-lize" the view.  With a little discipline, ISTM you can avoid
that fate.  

I hope this is useful to  you in your project.  

Regards, 

--jkl

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to