Re: [elephant-devel] Lisp Btrees: design considerations

Alex Mizrahi Thu, 15 May 2008 00:40:50 -0700

RLR> The purpose of compression is of course NOT to make the store smaller.
RLR> It is to decrease the amount of I/O that is required to access the
RLR> store.


time to do I/O does not linearly depend on I/O amount.

for example, HDD reads data in blocks, with each block definitely not lessthan 512 bytes (most likely larger due to HDD doing read-aheads).if you compress your data into 50 bytes, 10 times, it won't make I/O anyhowfaster.

moreover, random reads for HDD are orders (!) of magnitude slower thanlinear reads -- time to reposition device head is larger than time readingconsecutive blocks.


i can show this in numbers, just checked on iometer, doing random reads:
512 byte blocks: 106 operations per second (~10 ms seek time), 0.05 MiB/sec
64 KiB   blocks:  95 operations per second (~10 ms seek time), 5.95 MiB/sec

you see, we have 120 times (!) more I/O, but very low difference in numberof operations per second.so compression only will matter to people who store at least hundred KiB astheir values.

or it will matter if people are doing lots of linear reads -- but it's hardfor me to imagine such database access patterns.


RLR> there would be some databases for which it might be CPU-bound.  It is
RLR> hard for me to imagine this is true of most databases.

then it's worth defining what is "most databases" in case of elephant --
optimization without having a concrete target absolutely has no sense.

IMHO most people will be working with databases smaller than fewgigabytes --so read I/O has no sense for them (DDR reads are on scale of gigabytes persecond),

write I/O might matter, though.

RLR> If one imagines an integer-valued slot with a Poisson distribution of
RLR> values, for example having an average value of less than a thousand
RLR> but still occasionally containing a much large number, then one can
RLR> expect a simple codec that employs this fact to use about 10 bits for
RLR> each value, and 30 bits for the outliers.  The result is again a
RLR> 3-fold reduction in I/O.

bits, huh? what storage nowadays works on bit level?

even cache lines are 32 bytes wide, so make no difference to read 1 byte or32 of them.


RLR> Object pointers may of course cross the entire space in theory, but
RLR> OIDs that we store in the database may indeed come from a set of quite
RLR> knowable size, allowing us to use less than 64 bits or 32 bits to
RLR> represent them.  If we know the type of a slot to be only OIDs of a
RLR> particular class, we may very well know that that class has only 2**20
RLR> members, and therefore demands only 20 bits, as opposed to 64, to
RLR> represent the entire space of possible OIDs.

if you expect database being large enough for I/O to matter, why do youthink then that there would be that few objects?


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Re: [elephant-devel] Lisp Btrees: design considerations

Reply via email to