Ramon van Handel wrote:

> I hope you put the hack in a plugin, so it's clean and easy to remove again
> later...

I actually put it in a plugin!  I knew you would be into that.

> Hope to see CVS changes soon :)
> PS, I asked a while ago whether it wouldn't be possible to make the
> kernel/ directory a bit less messy by moving all of the specific
> emulation stuff into a separate kernel/emulation/ directory, but I
> never did get a reply.  So I ask again: would this be possible ?

It is possible, cleaner, and more modular to structure the
directories as you have above.  Mostly the static momentum
of all those bits piling up, has thwarted me into doing other
stuff. :^)  OK, I guess I'll do this.

> I think the problem is the unreliability of the current timing.
> BogoMIPS calculation relies on reliable timing...  if timing
> is very unreliable, it doesn't surprise me that it goes wrong.
> 
> Let me elaborate: in order for BogoMIPS calibration to work,
> the system needs to work in such a way that every time a unit
> of (PIT) time elapses, a roughly equal amount of instructions
> has been executed (the same instructions, the BogoMIPS loop
> is tight: see /usr/src/linux/arch/i386/lib/delay.c in
> linux 2.2.x).
> 
> How are you going to change this easily ?  The "shaping up" you
> talk about doesn't sound so straightforward to me... how about
> our previous idea, of adding timing IOCTLs to the kernel module
> and have it do the timing (still isn't easy, but it is a bit
> easier to get good control on timing).  Actually, this sounds
> like a rather difficult problem, because this "reliable" timing
> is not the same sort of timing we'd like to use for the rest
> of the VM run, which is a more synchronised-to-the-real-clock
> timing.

Remember back to the initial conversations regarding VM timing,
I always said that timing reference should be derived by the
monitor, and not the user space code.  Ultimately the timing
framework, and important function such as the PICs/PITs
should be moved to the monitor, as well as low-level functionality
from other critical components.  We can also have a user space
timing delivery mechanism for things like refreshing X-windows,
etc which are wall clock based.

We get interrupts at N Hz based on the host PIT setting, and
exceptions (well, a lot) when SBE is on, and less frequently
when it's not.  So I'll take a TSC reading to monitor the duration
of the guest execution (could shave a little off for IRET/INT
overhead), and use that to derive a very accurate time base to
drive devices such as the PITs in the monitor.

Though when we do read the TSC, it is very accurate, the resolution
of when we can read the TSC has lower bounds defined by the PIT
frequency (100Hz in Linux), and upper bounds defined by the
virtualization exceptions which occur between host timer ticks.

It would also be nice to receive an interrupt at an arbitrary time
between host timer ticks, based on when a timer facility in the
guest needs to go off.  This would increase our timer resolution greatly.
Some ideas based on whether certain facilities are available
on the current host:

  Use LAPIC self-interrupt
  Use performance counters (overflow event)
  Temporarily control the PIT
  Temporarily control the CMOS RTC
  Burn host cycles until the PIT countdown equals the
    next guest event interval.  Ugh!

Throw out some other ideas if you have more.

-Kevin

Reply via email to