On Mar 21, 2013, at 6:41 PM, Peter Haworth <p...@lcsql.com> wrote:

> I found the code in the two attached files (are they allowed on this list?)
> on the web a while back.  The claim was that it created an information
> schema database from an sqlite db.  I know nothing about Lua but I managed
> to get as far as creating the schema. Unfortunately, the url I got it from
> no longer has the code. but I'd be interested to know if it works.

You mean these?

http://alt.textdrive.com/svn/altdev/IMDB/Info.ddl
http://alt.textdrive.com/svn/altdev/IMDB/Info.lua

Yes, they work, as far as capturing the information schema goes.

Such queryable data dictionary  is not just to be cute, but rather to perform 
real work by easily introspecting the database.

Specifically, in this case, the data dictionary is use to automatically drive 
the ETL (Extract, Transform, Load) process to load the IMDB database [1]:

http://alt.textdrive.com/svn/altdev/IMDB/ETL.ddl
http://alt.textdrive.com/svn/altdev/IMDB/ETL.lua

The entire ETL is driven by introspecting the IMDB schema, courtesy of the data 
dictionary:

http://alt.textdrive.com/svn/altdev/IMDB/IMDB.ddl
http://alt.textdrive.com/svn/altdev/IMDB/IMDB.lua

So, for example, given the biographies.list.gz IMDB file, the ETL will 
automatically populate its corresponding person_biography table by:

(1) Extracting the raw biography data into an auto-generated staging table
(2) Transform all the appropriate foreign keys by populating their reference 
tables as needed
(3) Load the final person_biography table by automatically resolving all its 
references

Here is a typical run log:

2013-02-16 16:15:59 [IMDB.Process] Inflating 
/Volumes/Queens/IMDB/imdb/biographies.list.gz
2013-02-16 16:16:14 [IMDB.Process] Converting 
/Volumes/Queens/IMDB/imdb/biographies.list
2013-02-16 16:16:35 [IMDB.Process] Processing 
/Volumes/Queens/IMDB/imdb/biographies.list.txt
2013-02-16 16:16:35 [ETL.Extract] Inserting into person_biography_extract
2013-02-16 16:21:11 [ETL.Extract] Inserted 2,585,624 rows into 
person_biography_extract
2013-02-16 16:21:11 [ETL.Transform] Inserting into biography
2013-02-16 16:21:22 [ETL.Transform] Inserted 21 rows into biography
2013-02-16 16:21:22 [ETL.Description] Updating biography
2013-02-16 16:21:22 [ETL.Description] Updated 21 rows in biography
2013-02-16 16:21:22 [ETL.Transform] Inserting into person
2013-02-16 16:21:34 [ETL.Transform] Inserted 540,346 rows into person
2013-02-16 16:21:34 [ETL.Load] Inserting into person_biography
2013-02-16 16:22:23 [ETL.Load] Inserted 2,585,624 rows into person_biography

And that's that. All automated and (meta)data driven. Courtesy of a proper data 
dictionary. 

If only SQLite could provide such information schema out-of-the-box, now that 
would be value added :))


[1] http://www.imdb.com/interfaces

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to