Re: [gem5-users] Indeterministic gem5 behavior

Shehab Elsayed Wed, 11 Sep 2019 14:15:21 -0700

Is there a way to get the macroop from the corresponding instruction
pointer?



On Wed, Sep 11, 2019 at 5:07 PM Pouya Fotouhi <[email protected]> wrote:

> Hi Shehab,
>
> Can you please confirm what is the macroop that is issuing that load? I
> suspect it's one of the 128-bit instructions (maybe recently non-temporal
> ones that I added) that are executed as two 64-bit loads, and possibly the
> second one is failing due to the cda check that we do, and that stops the
> load from being committed.
>
> Best,
>
> On Wed, Sep 11, 2019 at 1:16 PM Shehab Elsayed <[email protected]>
> wrote:
>
>> So actually load instruction gets executed twice causing the assertion to
>> fail on the second time.
>>
>> 7694139490000: system.switch_cpus.iew.lsq.thread0: Doing memory access
>> for inst [sn:15059405] PC (0xffffffff810ed626=>0xffffffff810ed62a).(1=>2)
>> 7694139490000: system.switch_cpus.iew.lsq.thread0: Load [sn:15059405] not
>> executed from fault
>> 7694139490000: system.switch_cpus.iew.lsq.thread0: 1- Setting
>> [sn:15059405] as executed  (I added this message to track when LSQ
>> instructions are set as executed)
>>
>> I believe this instruction should then be committed and removed from the
>> LSQ before before executed again, however, this does not happen. Instead it
>> gets executed again before being removed and then comes the assertion
>> failure that it has already executed.
>>
>> I see that it gets sent to commit
>>
>> 7694139490000: system.switch_cpus.iew: Sending instructions to commit,
>> [sn:15059405] PC (0xffffffff810ed626=>0xffffffff810ed62a).(1=>2).
>>
>> but it never actually gets to commit and removed from LSQ.
>>
>>
>> On Mon, Sep 9, 2019 at 3:01 PM Pouya Fotouhi <[email protected]>
>> wrote:
>>
>>> You can try dumping Exec trace for the last few million ticks and see
>>> what is going on in your LSQ and why you have load instruction that is not
>>> executed.
>>>
>>> Best,
>>>
>>> On Mon, Sep 9, 2019 at 11:28 AM Shehab Elsayed <[email protected]>
>>> wrote:
>>>
>>>> I am not sure that prefetch_nta is the problem. For different runs the
>>>> simulation would fail after different periods after printing the
>>>> prefetch_nta warning message. Also, from what I have seen in different
>>>> posts it seems that this warning has been around for a while.
>>>>
>>>> I tried compiling my hello world program with -march=athlon64 alone and
>>>> together with -O0 and the the same problem happens.
>>>>
>>>> Also, the I am building my benchmark on the disk image directly using
>>>> qemu and the gcc on the image is versio 5.4.0
>>>>
>>>>
>>>>
>>>> On Sun, Sep 8, 2019 at 4:14 PM Pouya Fotouhi <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Shehab,
>>>>>
>>>>> Good, that's "progress"!
>>>>> My guess off the top of my head is that you used a "more recent"
>>>>> compiler (compared to what other gem5 users tend to use), and thus some
>>>>> instructions are being generated that were not critical to the execution 
>>>>> of
>>>>> applications other users had so far (and that's mostly why those
>>>>> instructions are not yet implemented). I think you have two options:
>>>>>
>>>>>    1. You can try implementing prefetch_nta, and possibly ignore the
>>>>>    non-temporal hint (i.e. implement it as a cacheable prefetch). You can
>>>>>    start by looking at the implementation of other prefetch instruction we
>>>>>    have in gem5 (basically you can do the same :) ).
>>>>>    2. Try compiling your application (I think we are still talking
>>>>>    about the hello world, right?), and target an older architecture (you 
>>>>> can
>>>>>    do as extreme as march=athlon64) with less optimizations involved to 
>>>>> avoid
>>>>>    these performance-optimizations (reducing cache pollution in this
>>>>>    particular case) that your compiler is trying to apply.
>>>>>
>>>>> My suggestion is to go with the first one, since running real
>>>>> applications compiled for an older architecture with less optimization on 
>>>>> a
>>>>> "newer" system is the equivalent of not using "parts/features" of your
>>>>> system (e.g. SIMD units, direct prefetch, etc), which would (possibly)
>>>>> directly impact any study you are working on.
>>>>>
>>>>> Best,
>>>>>
>>>>> On Sat, Sep 7, 2019 at 8:27 PM Shehab Elsayed <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I am sorry for the late update. I tried running with MESI_Two_Level
>>>>>> but the simulation ends with this error.
>>>>>>
>>>>>> warn: instruction 'prefetch_nta' unimplemented
>>>>>>
>>>>>> gem5.opt: build/X86_MESI_Two_Level/cpu/o3/lsq_unit.hh:621: Fault
>>>>>> LSQUnit<Impl>::read(LSQUnit<Impl>::LSQRequest*, int) [with Impl =
>>>>>> O3CPUImpl; Fault = std::shared_ptr<FaultBase>; LSQUnit<Impl>::LSQRequest 
>>>>>> = L
>>>>>> SQ<O3CPUImpl>::LSQRequest]: Assertion `!load_inst->isExecuted()'
>>>>>> failed.
>>>>>>
>>>>>> Which I believe has something to do with a recent update since I
>>>>>> don't remember seeing it before. And this error happens even for just 2
>>>>>> cores and 2 threads.
>>>>>>
>>>>>> On Fri, Sep 6, 2019 at 3:16 PM Pouya Fotouhi <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Shehab,
>>>>>>>
>>>>>>> As Jason pointed out, I won’t be surprised if you are having issues
>>>>>>> with classic caches running workloads that rely on locking mechanisms. 
>>>>>>> Your
>>>>>>> pthread implementation is possibly using some synchronization variables
>>>>>>> which requires cache coherence to maintain its  correctness, and classic
>>>>>>> caches (at least for now) doesn’t support that.
>>>>>>>
>>>>>>> Switch to ruby caches (I suggest MESI Two Level to begin with), and
>>>>>>> given your kernel version you should be getting stable behavior from 
>>>>>>> gem5.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> On Fri, Sep 6, 2019 at 11:47 AM Jason Lowe-Power <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Shehab,
>>>>>>>>
>>>>>>>> IIRC, there are some issues when using classic caches + x86 +
>>>>>>>> multiple cores on full system mode. I suggest using Ruby 
>>>>>>>> (MESI_two_level or
>>>>>>>> MOESI_hammer) for FS simulations.
>>>>>>>>
>>>>>>>> Jason
>>>>>>>>
>>>>>>>> On Fri, Sep 6, 2019 at 11:24 AM Shehab Elsayed <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> My latest experiments are with the classical memory system, but I
>>>>>>>>> remember trying Ruby and it was not different. I am using kernel 
>>>>>>>>> 4.8.13 and
>>>>>>>>> ubuntu-16.04.1-server-amd64 disk image. I am using Pthreads for my 
>>>>>>>>> Hello
>>>>>>>>> World program.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 6, 2019 at 1:13 PM Pouya Fotouhi <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Shehab,
>>>>>>>>>>
>>>>>>>>>> Can you confirm a few details about the configuration you are
>>>>>>>>>> using? Are you using classic caches or Ruby? What is the kernel 
>>>>>>>>>> version and
>>>>>>>>>> disk image you are using? What is the implementation of your 
>>>>>>>>>> "multithreaded
>>>>>>>>>> hello world" (are you using OMP)?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 6, 2019 at 8:58 AM Shehab Elsayed <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> First of all, thanks for your replies, Ryan and Jason.
>>>>>>>>>>>
>>>>>>>>>>> I have already pulled the latest changes by Pouya and the
>>>>>>>>>>> problem still persists.
>>>>>>>>>>>
>>>>>>>>>>> As for checkpointing, I was originally doing exactly what Jason
>>>>>>>>>>> mentioned and ran into the same problem. I then switched to not
>>>>>>>>>>> checkpointing just to avoid any problems that might be caused
>>>>>>>>>>> by checkpointing (if any). My plan was to go back to
>>>>>>>>>>> checkpointing after proving that it works without it.
>>>>>>>>>>>
>>>>>>>>>>> However, the problem doesn't seem to be related to KVM as linux
>>>>>>>>>>> boots reliable every time. The problem happens after the benchmarks 
>>>>>>>>>>> starts
>>>>>>>>>>> execution and it seems to be happening only when running multiple 
>>>>>>>>>>> cores
>>>>>>>>>>> (>=4). My latest experiments with a single core and 8 threads for 
>>>>>>>>>>> the
>>>>>>>>>>> benchmark seem to be working fine. But once I increase the number of
>>>>>>>>>>> simulated cores problems happen.
>>>>>>>>>>>
>>>>>>>>>>> Also, I have posted a link to the repo I am using to run my
>>>>>>>>>>> tests in a previous message. I have also added 2 debug traces with 
>>>>>>>>>>> the Exec
>>>>>>>>>>> flag for a working and non-working examples.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 6, 2019 at 11:28 AM Jason Lowe-Power <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Shehab,
>>>>>>>>>>>>
>>>>>>>>>>>> One quick note: There is *no way* to have deterministic
>>>>>>>>>>>> behavior when running with KVM. Since you are using the hardware, 
>>>>>>>>>>>> the
>>>>>>>>>>>> underlying host OS will influence the execution path of the 
>>>>>>>>>>>> workload.
>>>>>>>>>>>>
>>>>>>>>>>>> To try to narrow down the bug you're seeing, you can try to
>>>>>>>>>>>> take a checkpoint after booting with KVM. Then, the execution from 
>>>>>>>>>>>> the
>>>>>>>>>>>> checkpoint should be deterministic since it is 100% in gem5.
>>>>>>>>>>>>
>>>>>>>>>>>> BTW, I doubt you can run the KVM CPU in a VM since this would
>>>>>>>>>>>> require your hardware and the VM to support nested virtualization. 
>>>>>>>>>>>> There
>>>>>>>>>>>> *is* support for this in the Linux kernel, but I don't think it's 
>>>>>>>>>>>> been
>>>>>>>>>>>> widely deployed outside of specific cloud environments.
>>>>>>>>>>>>
>>>>>>>>>>>> One other note: Pouya has pushed some changes which implement
>>>>>>>>>>>> some x86 instructions that were causing issues for him. You can 
>>>>>>>>>>>> try with
>>>>>>>>>>>> the current gem5 mainline to see if that helps.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Jason
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 6, 2019 at 8:22 AM Shehab Elsayed <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> That's interesting. Are you using Full System as well? I don't
>>>>>>>>>>>>> think FS behavior is supposed to be so dependent on the host 
>>>>>>>>>>>>> environment!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Sep 6, 2019 at 11:16 AM Gambord, Ryan <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have found that gem5 behavior is sensitive to the execution
>>>>>>>>>>>>>> environment. I now run gem5 inside an ubuntu vm on qemu and have 
>>>>>>>>>>>>>> had much
>>>>>>>>>>>>>> more consistent results. I haven't tried running kvm gem5 inside 
>>>>>>>>>>>>>> a kvm qemu
>>>>>>>>>>>>>> vm, so not sure how that works, but might be worth trying.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Sep 6, 2019, 08:07 Shehab Elsayed <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was wondering if anyone is running into the same problem
>>>>>>>>>>>>>>> or if anyone has any suggestions on how to proceed with 
>>>>>>>>>>>>>>> debugging this
>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jul 29, 2019 at 4:57 PM Shehab Elsayed <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sorry for the spam. I just forgot to mention that the
>>>>>>>>>>>>>>>> system configuration I am using is mainly from
>>>>>>>>>>>>>>>> https://github.com/darchr/gem5/tree/jason/kvm-testing/configs/myconfigs.
>>>>>>>>>>>>>>>> <https://github.com/darchr/gem5/tree/jason/kvm-testing/configs/myconfigs>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Shehab Y. Elsayed, MSc.
>>>>>>>>>>>>>>>> PhD Student
>>>>>>>>>>>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer
>>>>>>>>>>>>>>>> Engineering
>>>>>>>>>>>>>>>> University of Toronto
>>>>>>>>>>>>>>>> E-mail: [email protected]
>>>>>>>>>>>>>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jul 29, 2019 at 4:08 PM Shehab Elsayed <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have set up a repo with gem5 that demonstrates the
>>>>>>>>>>>>>>>>> problem. The repo includes the latest version of gem5 from 
>>>>>>>>>>>>>>>>> gem5's github
>>>>>>>>>>>>>>>>> repo with a few patches applied to get KVM working together 
>>>>>>>>>>>>>>>>> with the kernel
>>>>>>>>>>>>>>>>> binary and disk image I am using. You can get the repo at
>>>>>>>>>>>>>>>>> https://github.com/ShehabElsayed/gem5_debug.git.
>>>>>>>>>>>>>>>>> <https://github.com/ShehabElsayed/gem5_debug.git>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> These steps should reproduce the problem:
>>>>>>>>>>>>>>>>> 1- scons build/X86/gem5.opt
>>>>>>>>>>>>>>>>> 2- ./scripts/get_fs_stuff.sh
>>>>>>>>>>>>>>>>> 3- ./scripts/run_fs.sh 8
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have also included sample m5term outputs for both a 2
>>>>>>>>>>>>>>>>> thread run (m5out_2t) and an 8 thread run (m5out_8t)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any help is really appreciated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jul 23, 2019 at 11:01 AM Shehab Elsayed <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> When I enable the Exec debug flag I can see that it seems
>>>>>>>>>>>>>>>>>> to be stuck in a spin lock (queued_spin_lock_slowpath)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Jul 19, 2019 at 5:28 PM Shehab Elsayed <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello All,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have a gem5 X86 full system set up that starts with
>>>>>>>>>>>>>>>>>>> KVM cores and then switches to O3 cores once the
>>>>>>>>>>>>>>>>>>> benchmark reaches the region of interest. Right now I am 
>>>>>>>>>>>>>>>>>>> testing with a
>>>>>>>>>>>>>>>>>>> simple multithreaded hello world benchmark. Sometimes
>>>>>>>>>>>>>>>>>>> the benchmark completes successfully while others gem5 just 
>>>>>>>>>>>>>>>>>>> seems to hang
>>>>>>>>>>>>>>>>>>> after starting the benchmark. I believe it is still 
>>>>>>>>>>>>>>>>>>> executing some
>>>>>>>>>>>>>>>>>>> instructions but without making any progress. The chance of 
>>>>>>>>>>>>>>>>>>> this behavior (
>>>>>>>>>>>>>>>>>>> indeterminism) happening increases as the number of
>>>>>>>>>>>>>>>>>>> simulated cores or the number of threads created by the 
>>>>>>>>>>>>>>>>>>> benchmark increases.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Any ideas what might be the reason for this or how I can
>>>>>>>>>>>>>>>>>>> start debugging this problem?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Note: I have tried the patch in https://gem5-review.
>>>>>>>>>>>>>>>>>>> googlesource.com/c/public/gem5/+/19568 but the problem
>>>>>>>>>>>>>>>>>>> persists.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> gem5-users mailing list
>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> gem5-users mailing list
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> gem5-users mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> gem5-users mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> gem5-users mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Pouya Fotouhi
>>>>>>>>>> PhD Candidate
>>>>>>>>>> Department of Electrical and Computer Engineering
>>>>>>>>>> University of California, Davis
>>>>>>>>>> _______________________________________________
>>>>>>>>>> gem5-users mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> gem5-users mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> gem5-users mailing list
>>>>>>>> [email protected]
>>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>>
>>>>>>> --
>>>>>>> Pouya Fotouhi
>>>>>>> PhD Candidate
>>>>>>> Department of Electrical and Computer Engineering
>>>>>>> University of California, Davis
>>>>>>> _______________________________________________
>>>>>>> gem5-users mailing list
>>>>>>> [email protected]
>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>>
>>>>>> _______________________________________________
>>>>>> gem5-users mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pouya Fotouhi
>>>>> PhD Candidate
>>>>> Department of Electrical and Computer Engineering
>>>>> University of California, Davis
>>>>> _______________________________________________
>>>>> gem5-users mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>
>>>> _______________________________________________
>>>> gem5-users mailing list
>>>> [email protected]
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>>>
>>>
>>> --
>>> Pouya Fotouhi
>>> PhD Candidate
>>> Department of Electrical and Computer Engineering
>>> University of California, Davis
>>> _______________________________________________
>>> gem5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
>
> --
> Pouya Fotouhi
> PhD Candidate
> Department of Electrical and Computer Engineering
> University of California, Davis
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Indeterministic gem5 behavior

Reply via email to