Hi there, there is some support for multi-threaded processes in ltrace, but so far it was incomplete. Everything works if the threads stay away of each other, but as soon as they end up in the same area of code, it all breaks.
The problem is due to return breakpoints. When two threads take the same function call, ltrace places two breakpoints over each other, because it has no concept of shared address space. There are many problems with this, and ltrace ends up seeing unexpected breakpoints, and SIGSEGVing the process. The way to solve this, ltrace must first learn that there is any such thing as task and thread group. Then it needs to store all the breakpoints in the structure shared by all the tasks in the thread group. To prevent races, before any breakpoint is temporarily disabled (for re-enablement, namely continue_after_breakpoint), all tasks in the thread group must be stopped. There is a code on the branch pmachata/threads that implements this. Here's what the branch roughly does: - Process * leader; was added to struct Process. This points to a process that is a thread group leader of a thread group that this process is a member of. - proper interfaces were added for handling the set of processes and their tasks (add_process, remove_process, each_process, each_task). The iteration interfaces (each_*) use call-backs to do the real work. - interfaces were added for accessing the information about the processes (process_leader, process_tasks, process_stopped, process_status). - a new interface task_kill is a wrapper for the SYS_tkill system call that is not wrapped by glibc. We use this to stop or continue a single task. - when we need to stop tasks for breakpoint re-enablement, we send SIGSTOP. This SIGSTOP has to be caught and sunk. While we wait for the signal to be delivered, we pump all incoming events to an event queue that was created for this purpose (each_qd_event, enque_event). The interface next_event takes events from the queue if there are any. - all this, the event interception, sinking of SIGSTOP etc., is very platform specific. So thread group now can have a registered event handler (install_event_handler, destroy_event_handler). If present, this is called at the beginning of handle_event. The registered handler can do whatever it wishes with the event in question, and return either NULL (if the event was handled or sunk) or the original (possibly modified) event that is then handled by the default handler as usual. - there have also been some small cleanups. For some reason, attaching to running multi-threaded task doesn't work (this was one of the first things that I fixed, but apparently it got broken in the meantime), so that's what I'll be doing next. Then comes cleaning it all up and making the git history of my branch a bit less messy, at which point I'd ask some of you to review the (rather large) patch. I also need to verify that it works on non-x86 architectures, so far I was only working with x86_64. I'll keep you posted as my work progresses. Any comments are welcome. Thanks, PM _______________________________________________ Ltrace-devel mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/ltrace-devel
