Re: [m5sim-users] VA/PA, asid in upper 16 bits

Steve Reinhardt Sat, 03 Dec 2005 20:33:44 -0800


Hi Antti,

Unfortunately you've stumbled into one of the messier areas of m5.Thanks to our SimpleScalar heritage, the functional and timing aspectsof the memory system are almost completely independent, and forhistorical reasons the memory system behavior varies significantlybetween the syscall-emulation (SE) and full-system (FS) modes as well.So it's pretty easy to get lost, and in fact I think you stumbled on aninteresting "issue" that I wasn't aware of myself. The only good news Ican give you is that we're not happy with the situation either and we'vebeen working hard at revamping the memory model to both integrate thefunctional & timing hierarchies and to make the SE and FS operation moreuniform. Of course the bad news is we're not quite there yet... soon,perhaps within a month or two, maybe earlier if you're willing to be aguinea pig for this code.

In SE mode, each process (= workload = Process object) has its ownfunctional memory that is indexed using virtual addresses. This memoryuses the MainMemory object in encumbered/mem/functional/main.*. Sincefunctional references for different processes go to different memoryobjects, there's no need for translation or physical addresses at all.

The dummyTranslation() translation exists solely for the timinghierarchy. The only time we run multiple processes in SE mode is whenwe're modeling a multiprogrammed workload on our SMT model. If we justused plain virtual addresses, then references to the same address fromdifferent processes would be treated identically by the cache (leadingto a lot of false hits). (There is an ASID that gets passed around alot, but that's not used as part of the cache block tag since in FS modethe cache operates on physical addresses, so we don't want to mess withit then.) So the dummyTranslation() munges in the ASID to make thecache lookups thread-aware even though there's no real physical address.

So conceptually, functional accesses should use the virtual address asis while cache (timing) accesses should use dummyTranslation(). But infact in SimpleCPU the translated address is used for both! I thinkwhat's happening here is that up to now all the addresses had all 0s inthe upper 16 bits, and since SimpleCPU does not support SMT the onlyASID it ever sees is 0, so it just happens to work. When you startaccessing virtual addresses that are not all 0 in the upper bits youtrigger this latent bug. Note that FullCPU has an ugly hack inDynInst::read() where it calls the translation function but thenoverwrites paddr with vaddr before doing the functional access... thisis needed since due to SMT the ASID is not always 0. Your short-termsolution of disabling dummyTranslation() is actually fine as long as youdon't start running mutliprogrammed SMT workloads on the FullCPU model.

I don't know how ld.so ends up having its stack so high in the addressspace. Certainly not from m5... if you look at the LiveProcessconstructor in sim/process.cc we force the initial SP to be below thestack segment (where the OS normally puts it). It's not too surprisingif ld.so does something special on its own.

Part of our new memory system is having a single physical memory evenfor SE mode, and a somewhat more realistic pagetable in SE mode thatlets us map multiple virtual address spaces to that physical memory. Sowe'll have to fix up mmap soon to make that model work anyway. Yes, thecurrent mmap implementation is a bit sketchy :-).


Steve

Antti P Miettinen wrote:

While trying to run alpha ld.so with M5 in the syscall emulation mode,
I noticed something that I don't quite understand. As far as I can
follow the code, when the emulated system calls do copyin/copyout they
do not do the VA/PA translation step.

Normally this apparently is not a problem as the syscall emulation
mode uses dummyTranslation(). But ld.so seems to get its stack into
the very top of the address space. So when ld.so calls uname() the
result gets written to 0xfffffffffff96e28, but when the following code
tries to access the result, the access goes through the translation
step and dummyTranslation clears the upper 16 bits of the addresses.
And the code gets the data from a different address.

Just for testing purposes I changed the dummy translation to do
nothing and naturally then the accesses happen to the right address
but I suppose this is not the right solution. Should'n the syscall
emulation copyin/copyout be changed to do the translation step?

Another question - does ld.so get its stack into the very top of the
VA on real alpha? Or is this an anomaly of trying to run a shared
object in M5?

I suppose making shared libs work would require better mmap() than the
current. With current mmap() ld.so thinks everything is statically
linked :-)




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
m5sim-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/m5sim-users

Re: [m5sim-users] VA/PA, asid in upper 16 bits

Reply via email to