> 1/3! Very nice. Is there any paper showing how to do this?
I have no idea. It's a pretty standard jenkins hash table; the algorithm
keeps the table pretty flat, and collissions 1-5 deep isn't much of a
speed problem usually. Since 1.4.8, you can seed the hash table to start
at a specific size, so you could benchmark having larger vs organic hash
tables.
> I'm using the last spymemcached release 2.7.3.
> This is caused by validateKey method in MemcachedClient:
>
> for(byte b : keyBytes) {
> if(b == ' ' || b == '\n' || b == '\r' || b == 0) {
> throw new IllegalArgumentException("Key contains invalid
> characters: ``" + key + "''");
>
> What java client do you recommend?
There should be a flag telling the client to not validate the key if
you're in binprot mode. Dustin? Anyone?
Spy should work fine; just find out how to get it to not call validatekey
> I have items with 24 byte keys and 12 byte values. Hence, I was
> expecting an item size around 36 bytes, rounded to a slab of 48 bytes.
> The smaller slab size I get in memcached is 96 bytes, but even so the
> items are been stored in 128 bytes.
> What I'm doing wrong?
Each item has be tracked with an item structure; internal flags (8 bit),
item flags (32bit), two pointers for the LRU (16 bytes), expiration time,
CAS, etc. So there's a minimum item size of 40ish bytes on a 64bit
system.
You can verify with the "./sizes" tool. on my system it says 48 bytes (no
CAS), so if you use -C, and you have a 36 byte key, the minimum slab size
would be 84 bytes.
So fitting that into a slab class of 96 bytes would waste 12 bytes per
item.
Now, look at the -n startup value, which is the "minimum size of a
key/value" ... so the starting chunk size is 48 bytes + 48 bytes by
default (or 96).
If we set that to -n 36 (your actual smallest size), that gives is an
initial slab size of 88 bytes (48 + 36 == 84, + 4 to make it 8 byte
aligned). So then you lose 4 bytes.
Which gets you an awful lot closer. However; if your keys are 24 bytes
because they're base64 encoded, but you can get them down to 8 or 16 bytes
by hashing them via sha1, you can adjust the -n even lower.
-Dormando