On Sat, 7 Jan 2017 17:54:20 -0500 Gan Uesli Starling <g...@starling.us> wrote:
> As for the suggestion to 'normalize' my four tables to one ... yes, I > could do that ... but it would make for a LOT of redundant data > duplicated in plural columns, which seems to me in my doubtless > ignorant newbie status, as missing the point for a database. Hello Gan! It's been 15 years and counting. Funny our paths should cross here. Regarding your table design, I wouldn't necessarly combine your table regions, but I would change how you define your aggregations. I noticed in your SQL the comment "SQL doesn't support arrays". Understanding *why* there are no arrays in SQL is a step in understanding how to design and use SQL databases. In SQL generally -- and in the relational model, strictly -- all data are dealt with as *sets*. Not lists, not arrays. Set elements have no defined order. That is, unlike an array, all attributes are explicitly expressed as data. No meaning attaches to the fact that an element is the Nth element of the set. (You doubtless at some time have come across the problem of dealing with an array element and not knowing where it stood in the array. In SQL, that information must be explicit in order to exist!) The advantage of dealing with a single datatype is manifested in the simplicity of the relational operators. By virtue of adding just one column, arrays become tables and are, yes, supported. To a one, flavors of SQL with array support do so with limitations, even though they *only* add complexity. There is no array operation that cannot be expressed in SQL; ergo there is is no need for array notation in SQL. Once you think of your arrays as vertical instead of horizontal, if you will, your tables become easier to design and manipulate. For example, instead of CREATE TABLE world ( iso CHAR(2) PRIMARY KEY, un CHAR(3), country VARCHAR(45), count SMALLINT DEFAULT 0, /* Total Count, needs daily update */ _160m SMALLINT DEFAULT 0, /* Band Count, need daily update for each */ _80m SMALLINT DEFAULT 0, _60m SMALLINT DEFAULT 0, ... I would suggest two tables: CREATE TABLE world ( iso CHAR(2) PRIMARY KEY, un CHAR(3), country VARCHAR(45) ); Create Table Aggr ( iso CHAR(2) not NULL foreign key references world(iso), size int not NULL, total int not NULL, -- not "count" because keyword primary key (iso, size) ); Now, it's not clear to me that the Aggr table is well advised. I think it might better be a view. Definitely I would define it as a view and try updating the table from that view. If you're frequently referencing whole chunks of Aggr, it might make sense to maintain the table. If you're usually referencing only a few rows of it, it probably makes more sense to keep just the view. In any case, updating the above table is simpler and almost certainly faster. Instead of one update per column (which IIUC is the genesis of this thread), UPDATE world SET _160m = (SELECT count from country_160m WHERE country = world.country) WHERE country IN (SELECT country FROM country_160m); use one update per table. First, make a tiny static table Bands of the interesting bands you want (160, 80, etc.). Then: update Aggr set total = ( select count(*) from "something" -- not sure what the real source is where size = Bands.size and iso = Aggr.iso and size = Aggr.size ) where exists ( select 1 from "something" where iso = Aggr.iso and size = Aggr.size ); That updates all bands for all countries in one swell foop. If you prefer to see bands-by-country, as in your world table, create a view that uses a technique known as "folding". It's verbose (as SQL is wont) but it can be surprisingly fast. But don't be two hasty: except for reports, most queries against the Aggr table won't need more than a view rows. It is more efficient and as easy to use "world" would be. Instead of, select ... from world where _160m > 7; you have select ... from Aggr where size = 160 and total > 7; The SQL becomes more regular because you don't have to write different queries depending on which size (band) you're interested in. The execution is faster because there's less I/O: the row is narrower, and it's found by a binary search on the index supporting the primary key. By contrast, your "world" table would need an index on every _nnnm column which, besides being tedious to define, would increase the cost of your inserts. One last word on aggregation and views. Many programmers seem to come to SQL with the instinct of pre-computing their aggregates. Most programming languages have no support for computation with sets, no built-in support for aggregation and quantification. Furthermore, because most applications deal with a static snapshot of the data, the instinct to laboriously massage the input into ever finer form is continually rewarded. SQL makes aggregation and quantification easy, and imposes an I/O cost on pre-computed aggregations. The input is volatile, and any aggregation is only as good as the last update. Those features often combine to discourage pre-computing. We see many questions here about using triggers or somesuch to maintain aggregations. Often the first question and best answer is, why not use a view? I resorted to using a table of aggregations on occasion, but only after writing the view and optimizing support for it. Truth be told, even on those occasions the tables really shouldn't have been necessary, but the convenience of the views encouraged other users to write other views atop them -- turtles all the way up! -- to the point that the DBMS threw in the towel and gave the query optimizer the day off. Having a small budget for towels, my only option was to "tabe-lize" the view. With a little discipline, ISTM you can avoid that fate. I hope this is useful to you in your project. Regards, --jkl _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users