Thanks to all who commented on my early results. I have added the
results of a mixed workload containing both Zipf and Scans. I followed
the example provided in the 2Q paper in which they tried out scans of
different lengths. I used a 10000 item cache with 100000 item dataset
and ran a mixed workload of 0.8 Zipf with 33% scans of lengths (0, 10,
100, 1000, 10000). The resulting graph is available as PNG and PDF.
The results show that there is a significant benefit by using the CART
algorithm.

Earlier, I was leaning towards using the 2Q algorithm but I found out
that it has a significant synchronization penalty. The Postgres
community implemented the 2Q algorithm (in 8.0) when they found out
that ARC was patented by IBM. Since then, they have gone to Clock (in
8.1) mostly due to the contention penalty in 2Q. Since CART is a
cousin of Clock, it may have less overhead.

See the wiki for the graphs.
Link: http://wiki.apache.org/db-derby/DerbyLruCacheManager

Thanks,

Gokul

On 6/5/06, Manish Khettry <[EMAIL PROTECTED]> wrote:
From the graphs it does look like we can gain a little performance by
switching the cache replacement policy.  A few thoughts on the wiki page and
the work you've done so far.

Would it be possible to have pointers to papers that describe these
algorithms on the wiki page? I could find CAR (
http://www.almaden.ibm.com/cs/people/dmodha/clockfast.pdf)
and 2Q (http://www.vldb.org/conf/1994/P439.PDF ) but not
CART.

From looking at the graphs, it looks like CART and 2Q are the two choices
really. However running a few more experiments may not be a bad idea. Knut
mentions adding scans to the zipf mix and that may be worthwhile. Actually
the original 2Q paper does mix scans of different lengths with one of the
zipf workload. Also given that the two seem to converge around 10K cache
size (or 10% of the dataset size), can we go upto 20 or 40% and see if one
policy is measurably better than the other?

If there isn't a whole lot of difference between the performance then it
makes sense to go with the simpler implementation and/or consider
synchronization concerns-- I'm guessing that 2Q (based on LRU) will involve
more synchronization than CART (Clock based).

Those papers and the your results were pretty interesting reading. I too am
curious about the zig-zag pattern on the zipf 0.8 dataset.

Thanks,
Manish



On 6/4/06, Gokul Soundararajan <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I'm a Google SoC student working on creating a CacheManager with a
> better cache replacement algorithm. After talking with my mentor
> (Oystein Grovlen), I have created a bunch of small prototypes to
> evaluate different algorithms found in literature.
>
> I have run them for a Zipf workload for 0.5 and 0.8 and posted the
> results on the Derby Wiki.
> Link:
http://wiki.apache.org/db-derby/DerbyLruCacheManager
> The results show that the 2Q, CAR, and CART algorithms perform better
> than LRU and Clock.
>
> I would like to get some feedback on my results to help me decide what
> I should implement in Derby.
>
> Thanks,
>
> Gokul
>


Reply via email to