[gem5-users] Re: gem5 crash when mount by vio-9p protocol in KVM mode with more than 1 core

Andreas Sandberg via gem5-users Mon, 22 Mar 2021 11:50:18 -0700

Hi Charlie,

If memory serves me right, you shouldn't need to do anything in the VirtIO 
devices themselves when running on KVM unless there is a bug somewhere (which I 
think there is, see the last paragraph).


The following discussion assumes that all devices live in a single event queue 
and that KVM CPUs, and only KVM CPUs, have their own event queues. I have 
typically used something like event queue 0 for devices and the KVM VM (KVM 
CPUs assume that devices use the same event queue as the KVM VM). This may be 
slightly different if you want to simulate multiple systems without shared 
memory.

When encountering an MMIO operation, the CPU will automatically switch to the 
device's event queue by using the event queue from the KVM VM they are 
associated with. Interrupts are a bit more complicated, but they should 
normally originate from the device event queue. There are basically two ways an 
interrupt can happen, either it is raised as a direct consequence of an MMIO or 
it happens asynchronously in the device. In both cases, you will be executing 
in the device's event queue. The CPU ensures that's the case when performing an 
MMIO and asynchronous events happen as a response to an event in the device's 
event queue.

Because of the behaviour described above, device normally never need to 
explicitly migrate between event queues.

A quick look at the code suggests that one "unique" thing about the 9P device 
is that it uses the poll queue. I wouldn't be surprised if asynchronous IO handling 
(SIGIO and other signals) has bugs when using multiple threads in gem5. By default, the 
code in src/base/pollevent.cc wakes up the first event queue to service the event. 
However, if you look at src/sim/simulate.cc, it seems like any event queue may service 
events in the poll queue. This is potentially a big issue for devices that use the poll 
queue. A quick workaround for that issue is to add a scoped migration to the device's 
event queue in VirtIO9PDiod::DiodDataEvent::process(int).

Cheers,
Andreas

On 16/03/2021 06:04, Gabe Black wrote:
Basically you want to make sure you've moved to the right event queue by the 
time any code you call tries to schedule an event. The VirtIO devices 
themselves don't seem to, but the code they're calling (interacting with other 
devices, sending transactions to the memory system) could be. If it doesn't 
work in kick(), then there may be a call earlier which is causing problems. 
This is only necessary on code paths that are called asynchronously, like when 
you get poked from the other process running the diod daemon for instance. 
Basically you want to always and only create a ScopedMigration when called 
asynchronously and when something that happens during that asynchronous call 
might directly or indirectly cause an event to be scheduled on the event queue.

I think Andreas has a lot of familiarity with this area, so hopefully he can 
chime in and let us know if we're on the right track.

Gabe

On Mon, Mar 15, 2021 at 10:39 PM Liyichao 
<liyic...@huawei.com<mailto:liyic...@huawei.com>> wrote:
Or you mean the ScopedMigration needs to be declared in VitIO9PBase::sendMsg in 
fs9p.cc before kick() be called?


void
VirtIO9PBase::sendRMsg(const P9MsgHeader &header, const uint8_t *data, size_t 
size)
{
   DPRINTF(VIO9P, "Sending RMsg\n");
   EventQueue::ScopedMigration migrate(eventQueue());
   dumpMsg(header, data, size);
   DPRINTF(VIO9P, "\tPending transactions: %i\n", pendingTransactions.size());
   assert(header.len >= sizeof(header));

   VirtDescriptor *main_desc(pendingTransactions[header.tag]);
   pendingTransactions.erase(header.tag);

   // Find the first output descriptor
   VirtDescriptor *out_desc(main_desc);
   while (out_desc && !out_desc->isOutgoing())
       out_desc = out_desc->next();
   if (!out_desc)
       panic("sendRMsg: Framing error, no output descriptor.\n");

   P9MsgHeader header_out(htop9(header));
   header_out.len = htop9(sizeof(P9MsgHeader) + size);

   out_desc->chainWrite(0, (uint8_t *)&header_out, sizeof(header_out));
   out_desc->chainWrite(sizeof(header_out), data, size);

   queue.produceDescriptor(main_desc, sizeof(P9MsgHeader) + size);
   kick();
}


________________________________
李翼超（Charlie）

华为技术有限公司 Huawei Technologies Co., Ltd.
[Company_logo]
部门：计算系统与组件开发部 [云与计算BG]
手　　机：15858232899
电子邮件：liyic...@huawei.com<mailto:liyic...@huawei.com>
地址：中国(China)-杭州(Hangzhou)-滨江区江淑路360号华为杭州研发中心Z4# [3-A06]
________________________________
 本邮件及其附件含有华为公司的保密信息，仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中
的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！
This e-mail and its attachments contain confidential information from HUAWEI, 
which
is intended only for the person or entity whose address is listed above. Any 
use of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by
phone or email immediately and delete it!

发件人: Liyichao
发送时间: 2021年3月16日 13:30
收件人: 'Gabe Black' <gabe.bl...@gmail.com<mailto:gabe.bl...@gmail.com>>
抄送: gem5 users mailing list <gem5-users@gem5.org<mailto:gem5-users@gem5.org>>
主题: 答复: [gem5-users] gem5 crash when mount by vio-9p protocol in KVM mode with 
more than 1 core

Hi Gabe:
        You mean that the code to be modified just like this?

        void
PciVirtIO::kick()
{
   DPRINTF(VIOIface, "kick(): Sending interrupt...\n");
   EventQueue::ScopedMigration migrate(eventQueue());
   interruptDeliveryPending = true;
   intrPost();
}


void
MmioVirtIO::kick()
{
   DPRINTF(VIOIface, "kick(): Sending interrupt...\n");
   EventQueue::ScopedMigration migrate(eventQueue());
   setInterrupts(interruptStatus | INT_USED_RING);
}

发件人: Gabe Black [mailto:gabe.bl...@gmail.com]
发送时间: 2021年3月16日 8:44
收件人: Liyichao <liyic...@huawei.com<mailto:liyic...@huawei.com>>
抄送: gem5 users mailing list <gem5-users@gem5.org<mailto:gem5-users@gem5.org>>
主题: Re: [gem5-users] gem5 crash when mount by vio-9p protocol in KVM mode with 
more than 1 core

I think what you want to do is in the kick() functions in MmioVirtIO and 
PciVirtIO, you want to declare a ScopedMigration at the start of the function, 
and pass its constructor the result of the eventQueue() method. The SimObject 
class inherits from EventManager and knows what event queue it's supposed to 
use, and that's what eventQueue returns. When you declare a ScopedMigration, it 
will handle the locking correctly and move over to that queue, and when it goes 
out of scope (at the end of the function) it will put everything back.

Please give that a try, and if it works for you (I don't have a way to test it 
myself) put up a review so we can get a fix checked in.

Gabe

On Mon, Mar 15, 2021 at 5:28 PM Liyichao 
<liyic...@huawei.com<mailto:liyic...@huawei.com>> wrote:
Thanks for your explaination.In<https://explaination.In> O3 type with 
multi-core 9P is ok.


发件人： Gabe Black<gabe.bl...@gmail.com<mailto:gabe.bl...@gmail.com>>
收件人： gem5 users mailing list<gem5-users@gem5.org<mailto:gem5-users@gem5.org>>
抄送： Liyichao<liyic...@huawei.com<mailto:liyic...@huawei.com>>
主题： Re: [gem5-users] gem5 crash when mount by vio-9p protocol in KVM mode with 
more than 1 core
时间： 2021-03-16 07:24:15

I haven't looked at the code yet, but this is probably because the v9 
implementation is getting asynchronous input which might be received by one 
thread, which then tries to schedule an event on an event queue associated with 
another queue. Most of the time this is not an issue since gem5 is usually 
single threaded, but when using multiple cores with KVM, each core runs in its 
own thread. There's a way to add events to the event queue in another thread 
safely (ScopedMigration) which I'm assuming the v9 code is not using.

Gabe

On Sun, Mar 7, 2021 at 8:38 PM Liyichao via gem5-users 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote:
Hi All:

        When I use –vio-9p with fs_bigLITTLE.py, 1 core the mount cmd was ok, 
but more than 1 core, mount cmd will cause GEM5 crash

        1core:
        Gem5 cmd: ./build/ARM/gem5.opt --debug-flags=Exec  -d 
/home/l00515693/m5out configs/example/arm/fs_bigLITTLE.py --cpu-type=kvm 
--kernel=vmlinux --machine-type=VExpress_GEM5_V1 
--disk=expanded-aarch64-ubuntu-trusty-headless.img --caches 
--big-cpu-clock=2.6GHz --little-cpu-clock=2.6GHz --big-cpus=1 --little-cpus=0 
--mem-size=4GB --param 'system.realview.gic.gem5_extensions = True' 
--bootscript=./test.rcS --vio-9p
        Mount cmd in guest OS: mount -t 9p -o 
trans=virtio,version=9p2000.L,aname=/home/l00515693/m5out/9p/share gem5 /mnt/9p
   root@charlie:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       9.9G  3.3G  6.2G  35% /
devtmpfs        2.0G  4.0K  2.0G   1% /dev
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            396M   52K  396M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            2.0G     0  2.0G   0% /run/shm
none            100M     0  100M   0% /run/user
gem5            2.9T  1.8T  989G  65% /mnt/9p



        2core:
        Gem5 cmd: ./build/ARM/gem5.opt --debug-flags=Exec  -d 
/home/l00515693/m5out configs/example/arm/fs_bigLITTLE.py --cpu-type=kvm 
--kernel=vmlinux --machine-type=VExpress_GEM5_V1 
--disk=expanded-aarch64-ubuntu-trusty-headless.img --caches 
--big-cpu-clock=2.6GHz --little-cpu-clock=2.6GHz --big-cpus=2 --little-cpus=0 
--mem-size=4GB --param 'system.realview.gic.gem5_extensions = True' 
--bootscript=./test.rcS --vio-9p

        Mount cmd in guest OS: mount -t 9p -o 
trans=virtio,version=9p2000.L,aname=/home/l00515693/m5out/9p/share gem5 /mnt/9p

   GEM5 crash info:
   gem5.opt: build/ARM/sim/eventq_impl.hh:40: void EventQueue::schedule(Event*, 
Tick, bool): Assertion `when >= getCurTick()' failed.
Program aborted at tick 476281849910020
--- BEGIN LIBC BACKTRACE ---
./build/ARM/gem5.opt(_Z15print_backtracev+0x40)[0xaaaadd24e418]
./build/ARM/gem5.opt(_Z12abortHandleri+0x5c)[0xaaaadd25efa4]
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffff9fea5688]
/lib/aarch64-linux-gnu/libc.so.6(raise+0xb0)[0xffff9f6634f8]
--- END LIBC BACKTRACE ---
Aborted (core dumped)

_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org>
To unsubscribe send an email to 
gem5-users-le...@gem5.org<mailto:gem5-users-le...@gem5.org>
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: gem5 crash when mount by vio-9p protocol in KVM mode with more than 1 core

Reply via email to