Re: Slow retrieval of distinct on indexed fields

Dan Nelson Tue, 18 Feb 2003 08:26:16 -0800

In the last episode (Feb 18), Peter Grigor said:
> I've always thought that adding a count of index entries to the index
> file would be helpful.
> 
> Like (generic index entry): word:15:1,2,5,7,etc...
> 
> where word is the indexed value, 15 is the number of times it
> appears, and 1,2,5,7 is the record offsets (or byte offsets for
> dynamic stuff) of the appearances of the indexed value.
> 
> This way, getting a count of indexed values would be quick, and
> scanning the index would no longer be necessary, because the count
> (15) would tell you how many index entries to skip to get the next
> indexed value in the index.


What you describe above is similar to a bitmap index; they trade off a
more complicated encoding scheme for extremely fast counts.  Storing
the record offset is sort of inefficient; that's 4 (8?) bytes per
record.  If you either store a bitmap of rows that that value occurs in
(i.e. 1100101 in your example), or delta record/byte offsets
(1,+1,+3,+2,0), and then compress those, you can shrink your indexes to
an incredibly small size.  Compressed indexes take a lot longer to
update, though.

See http://lists.mysql.com/cgi-ez/ezmlm-cgi?1:mss:125271:200211:bmokiphdlmibimfljjkf
for a description of how I would implement bitmap indexes in MySQL .

-- 
        Dan Nelson
        [EMAIL PROTECTED]

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Slow retrieval of distinct on indexed fields

Reply via email to