Hi Adrian,

On 30.08.2017 09:58, [email protected] wrote:
> I have tracked this down to the kernel failing to unmap pages in some cases 
> when deleting frame caps, specifically it would only unmap if deleting the 
> last capability to a frame, which is clearly wrong. As you say, this was only 
> able to manifest as a problem because your user space was already 
> miss-behaving and attempting to access memory that it shouldn't, because it 
> should've already been unmapped and just caused a user fault.
> 
> This bug would have been caught eventually by verification, they just have 
> not gotten to that point of the C refinement for x86-64 yet. Note that this 
> bug is *not* present on ARM.
> 
> Fix should be out on master shortly.

Great ! I'm looking forward to the updated kernel.

Thanks,

Alex.

> 
> Thanks again for reporting this.
> 
> Adrian
> 
> On Fri 25-Aug-2017 6:49 AM, Alexander Boettcher wrote:
> 
> Hello,
> 
> we're are in progress of extending our Genode/seL4 support, namely
> adding x86_64 and ARM support.
> 
> During this work I encountered on x86_64 some sporadically crashes of
> the seL4 kernel. It happened during the debugging of our roottask on
> seL4 in form of a page fault in the kernel ("KERNEL EXCEPTION") or a
> static assertion in the kernel (see below).
> 
> From my understanding, this shouldn't be possible based on your
> verification work. Nevertheless, it happens reliable.
> 
> (To be fair, the roottask was in this scenario already a bit confused
> and did things wrong - which however should never be enough to force the
> kernel in an exception or assertion - I presume.)
> 
> I saved and stripped down the scenario which causes the kernel exception
> with high reliably and hope you can re-produce it and find the root
> cause. I stripped down the scenario to just the Genode image binary and
> the steps we did to create the seL4 kernel from the vanilla sources.
> 
> Steps to reproduce:
> 
>  qemu-system-x86_64 -no-kvm -serial stdio -m 512 -kernel sel4 -initrd
> bomb_image.elf
> 
> The binaries of "bomb_image.elf" can be found at [1] and the kernel
> "sel4" at [2].
> 
> The kernel is based on seL4 6.0 and a patch to pc99/autoconf.h (see
> [0]). Beside that it is the vanilla 6.0 kernel:
> 
>  https://github.com/seL4/seL4.git
> 
>  git commit 8564ace4dfb622ec69e0f7d762ebfbc8552ec918
>  "update VERSION file to 6.0.0"
> 
>  patch -p1 <~autoconfig.patch
> 
>  BOARD=x86_64 ARCH=x86 SEL4_ARCH=x86_64 PLAT=pc99 DEBUG=1 make
> 
>  objcopy -O elf32-i386 kernel.elf.strip sel4
>  (Qemu complains otherwise about 64bit elf for boot)
> 
> The final "sel4" binary can be used with the "bomb_image.elf" and qemu
> as pointed out above.
> 
> I tried Qemu 2.5.0 as shipped with Ubuntu 16.04 LTS and Qemu 2.8.0
> (self-compiled). On real hardware the scenario typically leads after a
> while (kernel booted, roottask booted, scenario starts to run) to a
> reboot (without the messages or assertion as in the Qemu case, see below).
> 
> The tool-chain for Genode is based on GNU GCC 6.3.0 (see [3] and [4]).
> 
> The kernel was either compiled with Genode's toolchain (6.3.0) or with
> the standard GCC as provided by Ubuntu 16.04 LTS (GCC 5.4.0).
> 
> On x86_32 I could not reproduce the issue. (On ARM I did not try.)
> 
> I hope the information are helpful - if you need more just ask.
> 
> 
> Cheers,
> 
> Alex.
> 
> 
> 
> ========== KERNEL EXCEPTION ==========
> Vector:  0xe
> ErrCode: 0x0
> IP:      0xffffffff80720e60
> SP:      0xffffffff80740e08
> FLAGS:   0x6
> CR0:     0x8000003b
> CR2:     0x3007065 (page-fault address)
> CR3:     0x1ffb000 (page-directory physical address)
> CR4:     0x220
> 
> Stack Dump:
> *0xffffffff80740e08 == 0xffffff8001a99b00
> *0xffffffff80740e10 == 0xffffffff80729b1f
> *0xffffffff80740e18 == 0x1
> *0xffffffff80740e20 == 0xffffffff80721c59
> *0xffffffff80740e28 == 0xffffffff
> *0xffffffff80740e30 == 0x0
> *0xffffffff80740e38 == 0xffffff801f02b740
> *0xffffffff80740e40 == 0xffffff801f02b740
> *0xffffffff80740e48 == 0x1
> *0xffffffff80740e50 == 0xffffff8001a99000
> *0xffffffff80740e58 == 0x7
> *0xffffffff80740e60 == 0x59
> *0xffffffff80740e68 == 0xffffff8001a99b00
> *0xffffffff80740e70 == 0xffffffff80729cf3
> *0xffffffff80740e78 == 0x9000000000000007
> *0xffffffff80740e80 == 0xffffff8001a99059
> *0xffffffff80740e88 == 0xffffff80ffffffff
> *0xffffffff80740e90 == 0xffffffff80740f18
> *0xffffffff80740e98 == 0xffffff800013c200
> *0xffffffff80740ea0 == 0x10000
> 
> Halting...
> halting...
> Kernel entry via Syscall, number: 1, Call
> Cap type: 10, Invocation tag: 16
> 
> 
> or
> 
> 
> 
> seL4 failed assertion
> '(cte_t*)mdb_node_get_mdbNext(destSlot->cteMDBNode) == NULL &&
> (cte_t*)mdb_node_get_mdbPrev(destSlot->cteMDBNode) == NULL' at
> src/object/cnode.c:457 in function cteInsert
> halting...
> Kernel entry via Syscall, number: 1, Call
> Cap type: 10, Invocation tag: 18
> 
> 
> 
> or a fault by the roottask (and not in the kernel). So - this is no
> issue for the kernel:
> 
> 
> Caught cap fault in send phase at address 0x0
> while trying to handle:
> vm fault on data at address 0x18 with status 0x4
> in thread 0xffffff800013c200 "child of: 'rootserver'" at address 0x20419cf
> 
> 
> 
> [0]
> https://github.com/alex-ab/genode/blob/crash_sel4_kernel_vanilla/autoconfig.patch
> [1]
> https://github.com/alex-ab/genode/raw/crash_sel4_kernel_vanilla/image_bomb.elf
> [2] https://github.com/alex-ab/genode/raw/crash_sel4_kernel_vanilla/sel4
> 
> [3] https://genode.org/download/tool-chain
> [4] https://sourceforge.net/projects/genode/files/genode-toolchain/17.05/
> 
> 

-- 
Alexander Boettcher
Genode Labs

http://www.genode-labs.com - http://www.genode.org

Genode Labs GmbH - Amtsgericht Dresden - HRB 28424 - Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

_______________________________________________
Devel mailing list
[email protected]
https://sel4.systems/lists/listinfo/devel

Reply via email to