In the last episode (Feb 18), Peter Grigor said: > I've always thought that adding a count of index entries to the index > file would be helpful. > > Like (generic index entry): word:15:1,2,5,7,etc... > > where word is the indexed value, 15 is the number of times it > appears, and 1,2,5,7 is the record offsets (or byte offsets for > dynamic stuff) of the appearances of the indexed value. > > This way, getting a count of indexed values would be quick, and > scanning the index would no longer be necessary, because the count > (15) would tell you how many index entries to skip to get the next > indexed value in the index.
What you describe above is similar to a bitmap index; they trade off a more complicated encoding scheme for extremely fast counts. Storing the record offset is sort of inefficient; that's 4 (8?) bytes per record. If you either store a bitmap of rows that that value occurs in (i.e. 1100101 in your example), or delta record/byte offsets (1,+1,+3,+2,0), and then compress those, you can shrink your indexes to an incredibly small size. Compressed indexes take a lot longer to update, though. See http://lists.mysql.com/cgi-ez/ezmlm-cgi?1:mss:125271:200211:bmokiphdlmibimfljjkf for a description of how I would implement bitmap indexes in MySQL . -- Dan Nelson [EMAIL PROTECTED] --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php