Am 02.04.2009 um 11:03 schrieb Nils Faerber:
Dr. H. Nikolaus Schaller schrieb:
Hi all,
just an intermediate information to everyone (who does not yet know):
* Nils has tracked down the illegal instruction problem a little more
but no solution is found yet
- it looks as if it disappears when a kernel without caching is used
(horribly slow)
- it appears to be always end in the 'unaligned access' handler in
the
Or to be more exact the fault toggles a coprocessor 1 (CP1) exception
which is used by many parts, like unaligned handling, FPU emulation
and
several other things - also of course the real illegal instruction.
kernel (which means fetching data or instructions from an unaligned
address)
- it is not an illegal instruction in the executing (user space)
code
(which could come from a broken glibc)
Right. Also if there would be random data in the cache that is
executed
I would also expect random faults - illegal instructions, segfaults,
etc. but not *only* this illegal instruction fault with always the
same
pathology.
- it does not appear always at the same location
... of the running application.
This means some unpredictable (external) influence on the running
process:
* timers
* threads
* signals
- it is difficult to reproduce when running in gdb (but it also
shows up)
Right. And the GDB result is inconclusive.
- so this all indicates a real kernel synchronization issue
* Nils has started a discussion on the mips developer list:
http://www.nabble.com/Ingenic-JZ4730---illegal-instruction-td22376000.html
* I have found a similar project that has built a working kernel
for the
JZ7440 based Onda vx747 media player
http://code.google.com/p/jz-hacking/wiki/Index?tm=6
* we have contacted the developers and they also use the Ingenic
kernel
sources and patches as we do.
One difference is that they build their own toolchain but it is quite
improbable that this is the reason for the issue.
Right.
I also had a peek into the Ingenic Toolchain they provide on their
website. I have not looked through the full 160MB yet but it seems
that
they do not add any JZ patches to either GCC, binutils or glibc.
Well, it could also be the other way round that the OE toolchain adds
some patch which breaks things (only in our specific case).
So I would also look for that option.
So the place to look for the fault remains the kernel :(
What I am really wondering is why certain applications never fail
while
others do? E.g. the Xfbdev X server, being big (using lots of RAM) and
all kinds of peripherals, never fails. An XTerm never fails.
Applications that do fail are many GTK+ applications, even pretty
simple
ones.
This is really weird - I still have one idea: It might have to do with
threads. I think all the failing test applications initialise
threads at
startup. But even if it has todo with threads, what does it tell us?
Hm. Toolchain? And threads need some Kernel support in glibc (which is
the reason why I don't get Debian Lenny onto the Linux 2.4 kernel).
I have spent several weeks when building my Darwin hosted toolchains
to iron out all the TLS and thread management options and patches.
What still puzzles me is the damaged Frame Pointer Register in the
stack trace we have recently discussed.
This also indicates stack corruption and makes something simple as a
"return" fail... It may also be the reason for the failure not a result.
So I would suggest - if possible - to compile everything with a
different toolchain and then do a diff or at least compare file sizes
of some core components.
Nikolaus
_______________________________________________
Mipsbook-devel mailing list
Mipsbook-devel@linuxtogo.org
http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/mipsbook-devel