>That paper is the closest thing I have to a summary.
You recommend for branches, especial for those with calculated targets,
the single-step-technique. This means: set flag, return to guest, jump into
monitor again and resume. Lots of switches. And switches (ring0/3) are
expensive on x86, also because we save/restore the whole CPU state.
Is it cheaper to emulate all branches in monitor?
Yes, in my page table virtualization code I did also single stepping, but
because I didn't have an emulator in guest side. Now we have one, perhaps
someone can do some measurements?
jens