http://www.matasano.com/log/628/rafal-wojtczuks-user-mode-single-stepping-100x-faster-than-debuggers/Rafal Wojtczuk’s User-Mode Single Stepping: 100x Faster Than DebuggersThomas Ptacek | November 30th, 2006 | Filed Under: Bitching About Protocols, Reversing, Uncategorized ‘Tis the season, apparently. Cool new project and excellent blog post from libnids author and Eastern European reversing psychopath Rafal Wojtczuk, now at MCAF’s AVERT labs. He’s announced UMSS, the User Mode Single-Stepper, a tool for tracing the execution of Win32 binaries. Refresher: single-stepping stops a program after each individual CPU instruction, usually to record them. It’s usually done with a debugger; on Intel, you do it by setting the “trap flag”, which tells the CPU to generate exceptions after each instruction. The problem here is, each instruction traps to the kernel, which then transfers control to another process, which then transfers back to the kernel to find out what happened. A single user/kernel (u/k) transition is expensive: network programmers, who execute thousands of instructions between I/O operations, still try to minimize them. Debugger single-stepping involves multiple u/k context switches per instruction. It’s just nightmarishly slow. Rafal’s project speeds this up by 2 orders of magnitude by single stepping entirely in userland. How he does it is, he continuously rewrites the “next” instruction on the fly to transfer control to a handler function. This is similar to what Detours does in that Rafal is swapping out instructions with handler jumps. But Detours only instruments the prologues of each function. UMSS instruments every instruction, on the fly. This is tricky, because to do that for each instruction, you have to know where the next instruction is. It’s not always “the next instruction in memory”, because of jumps. It’s not always “the target of a jump”, because jumps are conditional. It’s not always even possible to look at an instruction and know the jump target, because jumps can be indirected through registers. UMSS solves this problem in two ways:
Why is this stuff important? To be honest, I don’t know. The “state of the art” in tracing programs right now is in instrumenting basic blocks, which are the ~10-20 instruction chunks that functions are composed of. For reversing purposes, this level of detail is usually more than enough. Clearly for malware research, where code is deliberately designed to be unclear, instruction-by-instruction detail is critical. I’d love for someone to tell me how I could exploit fast single-stepping to get a different project done. The bigger story is the apparent renaissance we’re experiencing in binary program manipulation. 7 years ago, technology like Detours, PaiMei, and UMSS would have been the closely-guarded crown jewels of security companies. Now they’re free side-projects. |