[LTP] LTP

Michal Simek Mon, 13 Jul 2009 03:45:00 -0700

Hi All,

I would like to send to mailing list more information about LTP - how is
helping us to remove bugs
from Microblaze kernel/toolchain code and of course for future kernel
bug hunting for new archs.


I am trying to resolve problems in MMU Microblaze kernel code. Anyway I
think you know that
Microblaze in is mainline. :-)

Ok. We run runtest/syscalls to check if our kernel/ABI works as expected.
I have found tree big bugs which I would like to talk about.

The first is problem with missing flush tlb from MMU after calling
mmap01 tests.
I think you saw thread on list too.
This problem was very hard to debug because any printk debug messages
caused correct test behavior.
We have in LTP nine standard mmap test and only mmap01 failed. I found
that calling sync() syscall
caused (as printk) correct test behavior. When I run mmap01 -c 100 the
first some tests failed and the rest passed.
This was the first moment when I wanted to use Microblaze Qemu emulator
to find out where the problem is. (Thanks to Edgar - author of Qemu
Microblaze port and for his huge help).
Emulator help me to see what Microblaze do. I turned on program counter
tracing to see what Linux kernel really does.
I was able to see program counter and full execution flow, MMU behavior
- tlb hit/miss and interrupts.
Unfortunately I was not able to see tlb invalidation (we will upgrade
emulator to support this too :-) )
I saw when I run mmap01 -c 100 that firsts some tests failed which were
on the top and tests which passed were on the bottom.
I asked Microblaze hw guys for help and they point me to that Microblaze
code not flush tlbs after mmap.

What Microblaze did? Microblaze have 64 tlb and we use 2 fixed tlb for
kernel code - to speedup kernel.
This changed caused that we have not too much tlb misses and for
flushing current old mapping and wasn't replaced by
updated one (mmap syscall do that update) - that's why kernel need to
flush tlb for old mapping. When I called printk debug messages or sync
syscall,
kernel flush more tlb and test passed. The same behavior was when I run
100 tests. The firsts some tests failed
because weren't interrupted by any other code. Tests which passed and
were on the bottom of my log were interrupted
by any code which caused that old tlb mapping was flushed and on next
access were used new one.

The result is that LTP has more than 70 tests which use mmap syscall but
only two tests uncover big problem with tlb flush.
The first test was mmap01 and the second, which was bonus for me, was
shmdt01.



The second problem which I have met with it was on fallocate01 syscall.
It wasn't too hard to fixed it because after some
printk debugging I found that we have problem with u64 parameters
(Microblaze is 32bit cpu). Problem was
that glibc wasn't able to pass to kernel sixth parameter when we used
syscall macro because syscall macro
use 7 parameters because first is number of syscall. That six parameters
are assembled from u32 and u64 values where
Microblaze use convention that higher u32 are in one register and lower
u32 in next.
Microblaze use r5-r10 for passing parameters to function/syscall.

Mapping for syscall function is
r5= syscall number
r6= 1. parameter
r7= 2. parameter
r8= 3. parameter
r9= 4. parameter
r10= 5. parameter
and 6. parameter was on the stack.

Syscall glibc fuction just do cross for parameters where:
r5 moves to r12 (syscall number reg)
r6 -> r5
r7 -> r6
r8 -> r7
r9 -> r8
r10 -> r9
and r10 keeps the same value as was in r10.
Microblaze toolchain not to load sixth parameter from stack which we fixed.

Thanks to LTP we found a bug in toolchain. Affected tests: fallocate01,
fallocate02, fallocate03, sync_file_range01


The last but not least test which help me find out problem in kernel was
eventfd01.
Eventfd syscall test setup eventfd in kernel and tests tried to read
value from kernel counter. This value is 64-bit. For passing
this value back to user application is used put_user macro for 64 bit
return value. We used special asm code for passing
64 bit parameters but this in wrong. A lot of applications used this
code but only two LTP tests find out the problem in it.
We fixed it with calling two put_user macros for u32 values(as Blackfin
does) which fixed two LTP tests - eventfd01 and sendfile02_64.
I still work on this case because I miss some pieces of puzzle but only
two tests points to kernel put user problem.


We have some other problem which we will have to solve.
I am surprised that LTP help us to find out kernel and toolchain
problems. It will be very hard to find out
these problems without LTP. I used simple LTP compilation for quick
toolchain tests because of good coverage.


Thanks,
Michal


-- 
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663


------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Ltp-list mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ltp-list

[LTP] LTP

Reply via email to