Hi, Mikhail, 1. As for the list 2, Chilimbi acutally use GC moving objects to improve the cache localty [Profile-guided Proactive Garbage Collection for Locality Optimization], but without his algorithm. I will read the paper again. His algorithm mainly focus on applications in C or C++. Maybe we can make use of GC to imporve the effect of the algorithm.
2. I read the Chilimbi`s paper again, and figure out a framework. And I will discuss about it here. Firstly, I want to follow Chilimbi`s Bursty Tracing Framework [Bursty Tracing: A Framework for Low-Overhead Temporal Profiling ]. It has two versions of the same code, one for check, and another for instrument. And in the instrumented code, it will record all the memory access patterns, named Data reference trace. What I want to do is to modify his instrumented code, once nCheck equals 0, we will active the Performance Counter to profile the cache miss rate and cache miss address of the instrumented code until the nInstr equals 0. We set up a threshold, and the instructions with miss rate larger than it are delinquent load or delinquent store. As we only active the Performance Counter in a short period, it won`t bring much overhead. Secondly, we abstract the delinquent instructions, output WPS, and then Use the SEQUITUR algorithm to process them. The comlexity of the algorithm is O(EL). In Chilimbi`s paper [ Efficient Representations and Abstractions for Quantifying and Exploiting Data Reference Locality.], he reduces the trace by processing the WPS twice. And I think we can follow his way. The mainly difference from Chilimbi`s algorithm is listed as above. There must be some deficiency. We can discuss about it. -------------- Best Regards, Qiong,Zou