Hi Antti,
Unfortunately you've stumbled into one of the messier areas of m5. Thanks to our SimpleScalar heritage, the functional and timing aspects of the memory system are almost completely independent, and for historical reasons the memory system behavior varies significantly between the syscall-emulation (SE) and full-system (FS) modes as well. So it's pretty easy to get lost, and in fact I think you stumbled on an interesting "issue" that I wasn't aware of myself. The only good news I can give you is that we're not happy with the situation either and we've been working hard at revamping the memory model to both integrate the functional & timing hierarchies and to make the SE and FS operation more uniform. Of course the bad news is we're not quite there yet... soon, perhaps within a month or two, maybe earlier if you're willing to be a guinea pig for this code.
In SE mode, each process (= workload = Process object) has its own functional memory that is indexed using virtual addresses. This memory uses the MainMemory object in encumbered/mem/functional/main.*. Since functional references for different processes go to different memory objects, there's no need for translation or physical addresses at all.
The dummyTranslation() translation exists solely for the timing hierarchy. The only time we run multiple processes in SE mode is when we're modeling a multiprogrammed workload on our SMT model. If we just used plain virtual addresses, then references to the same address from different processes would be treated identically by the cache (leading to a lot of false hits). (There is an ASID that gets passed around a lot, but that's not used as part of the cache block tag since in FS mode the cache operates on physical addresses, so we don't want to mess with it then.) So the dummyTranslation() munges in the ASID to make the cache lookups thread-aware even though there's no real physical address.
So conceptually, functional accesses should use the virtual address as is while cache (timing) accesses should use dummyTranslation(). But in fact in SimpleCPU the translated address is used for both! I think what's happening here is that up to now all the addresses had all 0s in the upper 16 bits, and since SimpleCPU does not support SMT the only ASID it ever sees is 0, so it just happens to work. When you start accessing virtual addresses that are not all 0 in the upper bits you trigger this latent bug. Note that FullCPU has an ugly hack in DynInst::read() where it calls the translation function but then overwrites paddr with vaddr before doing the functional access... this is needed since due to SMT the ASID is not always 0. Your short-term solution of disabling dummyTranslation() is actually fine as long as you don't start running mutliprogrammed SMT workloads on the FullCPU model.
I don't know how ld.so ends up having its stack so high in the address space. Certainly not from m5... if you look at the LiveProcess constructor in sim/process.cc we force the initial SP to be below the stack segment (where the OS normally puts it). It's not too surprising if ld.so does something special on its own.
Part of our new memory system is having a single physical memory even for SE mode, and a somewhat more realistic pagetable in SE mode that lets us map multiple virtual address spaces to that physical memory. So we'll have to fix up mmap soon to make that model work anyway. Yes, the current mmap implementation is a bit sketchy :-).
Steve Antti P Miettinen wrote:
While trying to run alpha ld.so with M5 in the syscall emulation mode, I noticed something that I don't quite understand. As far as I can follow the code, when the emulated system calls do copyin/copyout they do not do the VA/PA translation step. Normally this apparently is not a problem as the syscall emulation mode uses dummyTranslation(). But ld.so seems to get its stack into the very top of the address space. So when ld.so calls uname() the result gets written to 0xfffffffffff96e28, but when the following code tries to access the result, the access goes through the translation step and dummyTranslation clears the upper 16 bits of the addresses. And the code gets the data from a different address. Just for testing purposes I changed the dummy translation to do nothing and naturally then the accesses happen to the right address but I suppose this is not the right solution. Should'n the syscall emulation copyin/copyout be changed to do the translation step? Another question - does ld.so get its stack into the very top of the VA on real alpha? Or is this an anomaly of trying to run a shared object in M5? I suppose making shared libs work would require better mmap() than the current. With current mmap() ld.so thinks everything is statically linked :-)
------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ m5sim-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/m5sim-users
