Hi Sebastien, > OTOH, while your scheme would probably give good results for the > instruction cache, the data written to the FIFO by the first process > would miss the data cache when read by the second process because of the > mismatched task IDs.
Indeed. For the scenario of piping data through a memory page shared between producer and consumer, a physically indexed D-cache would provide the best performance. The data between producer and consumer could then travel directly through the cache regardless at where the shared memory is mapped in each process. If I understand correctly, when using a virtually-indexed physically-tagged cache (or task IDs for that matter), each physical address would end up in the cache twice (one copy for each process). Would the cache be designed to associate those cache lines with each other? How is the consistency between both copies maintained? I.e., is the cache line of the consumer invalidated when the producer writes data to his respective cache line? > BTW, we are using 4KB I and D caches in the Milkymist SoC, so if we fix > the page size to 4KB as well we'd avoid aliasing problems entirely (the > L2 cache would be physically indexed and tagged). Are you speaking about the MMU page size? In our experience, the use of different page sizes makes a huge difference for performance when using a software-loaded TLB because each TLB miss must be resolved in software. If the TLB has 64 entries and you pin the page size to 4K, a massive amount of page-faults will be generated as soon as the working set exceeds 256K of memory. With Genode on MicroBlaze, we started with using only 4K page sizes and then went for supporting the whole range of page sizes at a later stage. The performance (i.e., the time needed to boot a simple application scenario) was boosted by factor 10! > To sum up: unless I understood something incorrectly, if we use a > virtually indexed physically tagged cache with: > cache associativity * page size = cache size > we can happily context switch without taking care of the cache at all > and without unnecessary cache flushes, cache misses or CPU pipeline > stages. How does this hold true when processes communicate via shared memory as outlined above? Cheers Norman _______________________________________________ http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org IRC: #milkymist@Freenode Twitter: www.twitter.com/milkymistvj Ideas? http://milkymist.uservoice.com
