On Jan 19, 12:44 pm, jim <[email protected]> wrote: > At least initially we would use normal java serialization. Since we > are talking about using prepend to put the most recent entries in our > list, we cannot use gzip anymore since it would not gel with > prepending bytes.
Are you sure? % echo hi | gzip > /tmp/blah.gz % echo hi again | gzip >> /tmp/blah.gz % gzip -dc /tmp/blah.gz hi hi again > Did you use a custom serialization scheme? Or do you mean the binary > serializer? Because at some point you HAVE to serialize something to > a byte[], right? A series of 10,000 java longs serialized with plain java serialization, that ends up being 140122 bytes of data in my quick test. A textual (space separated) representation is up to 209999 bytes (though could be as low as about 20k). A naïve byte-packed representation would be 80,000 bytes. Compression on this will vary based on your actual values, of course. > Did you prepend data at all? I don't see many > people talking about utilizing the prepend/append methods within the > protocol, I'm also trying to figure out why this is. It makes me a bit uncomfortable to have limited visibility into growth, but it should work. > Even though it appears the network fetch of even 10,000 longs isn't > much data, since we have so many of these lists to process it ends up > becoming a large network hit once you've done it 200k+ times. Yeah, could take a few minutes to do them all at the same time if they were all full and there was no compression. I don't fully understand your use case. Perhaps you just need a notification mechanism to invalidate local cache when upstream changes. I've used memcached for that with a version number in my data + a key with the version number in it. I'll fetch the small value (just the version number) on request and compare it to my internal version number. If it's different or missing, I fetch the large value and memoize it. If it's the same, I know I'd get the same value again. When I update, I update the large value and then the version number. I found I had to do this with an application where the bulk of my time was spent transferring stuff out of memcached and converting the data back into my internal-memory representation.
