The bash script below attempts to quantify the relationship between SQLite (3.5.0 alpha) page_size and cache_size towards the speed of populating a large table with many indexes.
First some rough metrics on the final populated database: The database is approximately 208M in size, consisting of a single table, foo, and its associated indexes. The table 'foo' ends up having 177,147 randomly populated rows of a few hundred bytes each. Its schema can be gleaned from the script. The script builds a brand new database from scratch for each page_size/cache_size combination for each run to largely eliminate operating system cache effects. #!/bin/bash SQLITE=./sqlite3 for S in 4096 8192 16384 32768; do for C in 1000 2000 4000 8000 16000; do rm -f foo.db foo.csv echo " pragma page_size=$S; pragma default_cache_size=$C; CREATE TABLE foo( a text unique, b text unique not null, c text unique not null, primary key(c,a,b) ); CREATE INDEX foo_ba on foo(b,a); " | $SQLITE foo.db $SQLITE >foo.csv <<EOF create view v1 as select 1 union all select 2 union all select 3; select hex(randomblob(17)), hex(randomblob(13)), hex(randomblob(131)) from v1,v1,v1,v1,v1,v1,v1,v1,v1,v1,v1; EOF echo -n "page_size $S, cache_size $C : " time $SQLITE foo.db ".import foo.csv foo" 2>&1 | head -1 done done The raw output of the script on a 512M Linux box: page_size 4096, cache_size 1000 : 39.72user 30.88system 4:32.46elapsed 25%CPU page_size 4096, cache_size 2000 : 40.67user 26.34system 4:46.75elapsed 23%CPU page_size 4096, cache_size 4000 : 40.47user 21.61system 4:34.05elapsed 22%CPU page_size 4096, cache_size 8000 : 41.80user 13.37system 4:55.15elapsed 18%CPU page_size 4096, cache_size 16000 : 42.23user 7.58system 4:10.79elapsed 19%CPU page_size 8192, cache_size 1000 : 40.62user 37.50system 3:11.05elapsed 40%CPU page_size 8192, cache_size 2000 : 43.01user 26.60system 3:04.52elapsed 37%CPU page_size 8192, cache_size 4000 : 42.85user 16.55system 2:57.13elapsed 33%CPU page_size 8192, cache_size 8000 : 43.62user 8.08system 2:34.28elapsed 33%CPU page_size 8192, cache_size 16000 : 43.11user 2.75system 1:48.53elapsed 42%CPU page_size 16384, cache_size 1000 : 43.07user 47.92system 2:19.82elapsed 65%CPU page_size 16384, cache_size 2000 : 42.41user 31.77system 1:59.79elapsed 61%CPU page_size 16384, cache_size 4000 : 42.38user 18.70system 1:47.69elapsed 56%CPU page_size 16384, cache_size 8000 : 41.83user 9.06system 1:18.35elapsed 64%CPU page_size 16384, cache_size 16000 : 41.28user 8.36system 1:00.16elapsed 82%CPU page_size 32768, cache_size 1000 : 44.19user 52.55system 2:03.40elapsed 78%CPU page_size 32768, cache_size 2000 : 43.15user 27.36system 1:35.95elapsed 73%CPU page_size 32768, cache_size 4000 : 43.18user 11.14system 1:10.48elapsed 77%CPU page_size 32768, cache_size 8000 : 42.91user 10.34system 1:04.69elapsed 82%CPU page_size 32768, cache_size 16000 : 42.87user 10.28system 1:02.35elapsed 85%CPU The output regrouped by equal sized page_size x cache_size buckets, for equivalent total sqlite cache size per run: -- ~16M total cache (1/16th of database pages in cache) page_size 4096, cache_size 4000 : 40.47user 21.61system 4:34.05elapsed 22%CPU page_size 8192, cache_size 2000 : 43.01user 26.60system 3:04.52elapsed 37%CPU page_size 16384, cache_size 1000 : 43.07user 47.92system 2:19.82elapsed 65%CPU -- ~32M total cache (1/8th of database pages in cache) page_size 4096, cache_size 8000 : 41.80user 13.37system 4:55.15elapsed 18%CPU page_size 8192, cache_size 4000 : 42.85user 16.55system 2:57.13elapsed 33%CPU page_size 16384, cache_size 2000 : 42.41user 31.77system 1:59.79elapsed 61%CPU page_size 32768, cache_size 1000 : 44.19user 52.55system 2:03.40elapsed 78%CPU -- ~64M total cache (a quarter of database pages in cache) page_size 4096, cache_size 16000 : 42.23user 7.58system 4:10.79elapsed 19%CPU page_size 8192, cache_size 8000 : 43.62user 8.08system 2:34.28elapsed 33%CPU page_size 16384, cache_size 4000 : 42.38user 18.70system 1:47.69elapsed 56%CPU page_size 32768, cache_size 2000 : 43.15user 27.36system 1:35.95elapsed 73%CPU -- ~128M total cache (half of database pages in cache) page_size 8192, cache_size 16000 : 43.11user 2.75system 1:48.53elapsed 42%CPU page_size 16384, cache_size 8000 : 41.83user 9.06system 1:18.35elapsed 64%CPU page_size 32768, cache_size 4000 : 43.18user 11.14system 1:10.48elapsed 77%CPU -- ~256M total cache (all database pages in cache) page_size 16384, cache_size 16000 : 41.28user 8.36system 1:00.16elapsed 82%CPU page_size 32768, cache_size 8000 : 42.91user 10.34system 1:04.69elapsed 82%CPU Here we see the output of the script in tabular form: INSERT Wall Clock Time In Seconds (lower is better) page_size 4096 8192 16384 32768 ---- ---- ----- ----- 16000 250 108 60 62 8000 295 154 78 64 cache_size 4000 274 177 107 70 2000 286 184 119 95 1000 272 191 139 123 One might expect to see similar timings for equal page_size x cache_size values, but this is not the case. Take, for example, the case where a total of roughly 64M of sqlite cache is used. Elapsed time varies widely. -- ~64M total cache (a quarter of database pages in cache) page_size 4096, cache_size 16000 : 42.23user 7.58system 4:10.79elapsed 19%CPU page_size 8192, cache_size 8000 : 43.62user 8.08system 2:34.28elapsed 33%CPU page_size 16384, cache_size 4000 : 42.38user 18.70system 1:47.69elapsed 56%CPU page_size 32768, cache_size 2000 : 43.15user 27.36system 1:35.95elapsed 73%CPU We see that when only 1/4 of the database is able to fit into sqlite cache, it can be 2.6 times more speed efficient to favor a large page_size over a large cache size to produce an equivalent amount of sqlite cache. This supports the argument that sqlite performs better when locality of reference improves. When the entire database can fit into sqlite cache, however, we can see that having a smaller page size can be marginally more speed efficient: -- ~256M total cache (all database pages in cache) page_size 32768, cache_size 8000 : 42.91user 10.34system 1:04.69elapsed 82%CPU page_size 16384, cache_size 16000 : 41.28user 8.36system 1:00.16elapsed 82%CPU page_size = 4096, cache_size = 64000 : 41.92user 2.48system 0:58.87elapsed 75%CPU (*) (*) last row required a re-run of the script with new parameters These figures are naturally schema and data dependent, but they do highlight some basic trends. I maintain that pre-reserving contiguous blocks of pages for each table and index would considerably improve sqlite insert speed into multi-index tables. If this were the case, locality of reference could be improved on a per table/index basis, and you might be able to use smaller page sizes for further speed gains. If wasted space is a concern, the unused pre-reserved pages could be collected with a VACUUM after table population. ____________________________________________________________________________________ Yahoo! oneSearch: Finally, mobile search that gives answers, not web links. http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------