ard schrieb: > On Thu, Apr 02, 2009 at 11:49:55AM +0200, Nils Faerber wrote: >>> So I would suggest - if possible - to compile everything with a >>> different toolchain and then do a diff or at least compare file sizes of >>> some core components. >> This is not that easy. We are talking about recompiling *everything*, >> i.e. the whole distribution. Tools like OE have been created because >> cross distribution building is not that easy. > I tend to go blackbox: > We have to focus on only making an application SIGILL fast and > reproducible. > I mean: a minimal rootfs that probably only has a linuxrc or > something like that, that makes an application SIGILL, and > halts or reboots. > I also think that threads are the problem, but we can't say for > sure. Anyway: I guess the threading works until some other > process introduces something into the cache. So making it > reproducible can be a little hard.
I am not that sure about the threads - if I remember correctly also the Debian Etch rootfs which did not use the TLS extension also sigilled. It may be that threads simply trigger the bug but are not the real cause. > As for a solution: the system can reboot fast. I guess we can > start with putting some cache flushes around an important > schedule point so that it disappears, and then move those flushes > to more specific scheduler functions, or even look at what kind > of flush should be done. > That's not a fun job, but it get's us to a point where we can > start pointing at things, and finally makes us understand it. > So we are here: > 1) We have apps that sigill. Great, we have to detect that, log it > and reboot. > 2) We have variant that works (cache turned off). Great, we have > to detect that, log it and reboot. > The next step is to get to step 2 with mere cache flushes. > For that we have to know the diffent caches: > TLB, icache, dcache, maybe flush by cacheline? > If we can get to 2 with cache flushes, we can start pointing at a > cache. > > Anyway: work says more than theories. > Nils: do you have a setup that a new kernel quickly, and a setup > that SIGILL's fast? Well, ATM I use my OE buitl rootfs and start gpe-info several times. It fails pretty soon. The cache flushing is also something I thought of, but when do you want to do that? You have to insert that call somewhere - where? And second I was not really able to pinpoint specific flush instructions in the kernel. It seems that the cache flushes are done implicitely, i.e. they are triggered an then functions from arch/mips/mm/ are called that handle it. The whole area is also sprinkled with mean macros :( So if you can tell me a way how to flush the caches and have an idea where to put such flushes I can happily make all kinds of tests for you. > Or should I just focus on the SIGILL fast user-space part? (To > speed up testing) I think testing is pretty easy (e.g. using the above rootfs and app). I only takes me 30secs up to max. 1 minute to test a kernel (rootfs and kernel via network). I am just pretty clueless now where to put any new tests into the kernel. Cheers nils faerber -- kernel concepts GbR Tel: +49-271-771091-12 Sieghuetter Hauptweg 48 Fax: +49-271-771091-19 D-57072 Siegen Mob: +49-176-21024535 http://www.kernelconcepts.de _______________________________________________ Mipsbook-devel mailing list Mipsbook-devel@linuxtogo.org http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/mipsbook-devel