On Sun, 2005-08-21 at 11:36 -0400, Jason Tackaberry wrote: > Assuming indexed queries, it's the number of rows returned that matters, > not the number of selects. A select which uses an indexed column which > returns 0 rows is very, very fast. It's practically a no-op.
Attached is some (very ugly) test code which supports this statement. Here's the program output: *** Creating single table ... Created table with 300000 rows. Query for known dir_id returned 150000 rows, took 1.6794 seconds. Query for unknown dir_id returned 0 rows, took 0.0718 seconds. *** Creating multi table ... Created table with 30000 rows. Query for known dir_id returned 15000 rows, took 0.3160 seconds. Query for unknown dir_id returned 0 rows, took 0.0007 seconds. There are 30000 files total, with 2 directories, 15k files in each directory. Each file has 10 attributes, hence the single table has 10 times more rows than the multi table. Actually I'm a bit surprised that a query for unknown dir_id took as long as it did in the single table test. It's certainly much more than 10 times slower than the multi table test, which suggests there might be some extra I/O involved. I'm not sure what's happening there. But at any rate, I think this convincingly demonstrates the table-per-file-type, column-per-attribute approach is going to yield appreciably better performance. Cheers, Jason.
signature.asc
Description: This is a digitally signed message part