Re: kvmclock doesn't work, help?

2015-12-23 Thread Marcelo Tosatti
On Mon, Dec 21, 2015 at 02:49:25PM -0800, Andy Lutomirski wrote:
> On Fri, Dec 18, 2015 at 1:49 PM, Marcelo Tosatti  wrote:
> > On Fri, Dec 18, 2015 at 12:25:11PM -0800, Andy Lutomirski wrote:
> >> [cc: John Stultz -- maybe you have ideas on how this should best
> >> integrate with the core code]
> >>
> >> On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti  
> >> wrote:
> 
> >> > Can you write an actual proposal (with details) that accomodates the
> >> > issue described at "Assuming a stable TSC across physical CPUS, and a
> >> > stable TSC" ?
> >> >
> >> > Yes it would be nicer, the IPIs (to stop the vcpus) are problematic for
> >> > realtime guests.
> >>
> >> This shouldn't require many details, and I don't think there's an ABI
> >> change.  The rules are:
> >>
> >> When the overall system timebase changes (e.g. when the selected
> >> clocksource changes or when update_pvclock_gtod is called), the KVM
> >> host would:
> >>
> >> optionally: preempt_disable();  /* for performance */
> >>
> >> for all vms {
> >>
> >>   for all registered pvti structures {
> >> pvti->version++;  /* should be odd now */
> >>   }
> >
> > pvti is userspace data, so you have to pin it before?
> 
> Yes.
> 
> Fortunately, most systems probably only have one page of pvti
> structures, I think (unless there are a ton of vcpus), so the
> performance impact should be negligible.
> 
> >
> >>   /* Note: right now, any vcpu that tries to access pvti will start
> >> infinite looping.  We should add cpu_relax() to the guests. */
> >>
> >>   for all registered pvti structures {
> >> update everything except pvti->version;
> >>   }
> >>
> >>   for all registered pvti structures {
> >> pvti->version++;  /* should be even now */
> >>   }
> >>
> >>   cond_resched();
> >> }
> >>
> >> Is this enough detail?  This should work with all existing guests,
> >> too, unless there's a buggy guest out there that actually fails to
> >> double-check version.
> >
> > What is the advantage of this over the brute force method, given
> > that guests will busy spin?
> >
> > (busy spin is equally problematic as IPI for realtime guests).
> 
> I disagree.  It's never been safe to call clock_gettime from an RT
> task and expect a guarantee of real-time performance.  We could fix
> that, but it's not even safe on non-KVM.

The problem is how long the IPI (or busy spinning in case of version
above) interrupts the vcpu.

> Sending an IPI *always* stalls the task.  Taking a lock (which is
> effectively what this is doing) only stalls the tasks that contend for
> the lock, which, most of the time, means that nothing stalls.
> 
> Also, if the host disables preemption or otherwise boosts its priority
> while version is odd, then the actual stall will be very short, in
> contrast to an IPI-induced stall, which will be much, much longer.
> 
> --Andy

1) The updates are rare.
2) There are no user complaints about the IPI mechanism.

Don't see a reason to change this.

For the suspend issue, though, there are complaints (guests on 
laptops which fail to use masterclock). 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-23 Thread Andy Lutomirski
On Wed, Dec 23, 2015 at 11:27 AM, Marcelo Tosatti  wrote:
> On Mon, Dec 21, 2015 at 02:49:25PM -0800, Andy Lutomirski wrote:
>> On Fri, Dec 18, 2015 at 1:49 PM, Marcelo Tosatti  wrote:
>> > (busy spin is equally problematic as IPI for realtime guests).
>>
>> I disagree.  It's never been safe to call clock_gettime from an RT
>> task and expect a guarantee of real-time performance.  We could fix
>> that, but it's not even safe on non-KVM.
>
> The problem is how long the IPI (or busy spinning in case of version
> above) interrupts the vcpu.

The busy spin should be a few hundred cycles in the very worst case (a
couple of remote cache misses timed such that the guest is spinning
the whole time).  The IPI is always thousands of cycles no matter what
the guest is doing.

>
>> Sending an IPI *always* stalls the task.  Taking a lock (which is
>> effectively what this is doing) only stalls the tasks that contend for
>> the lock, which, most of the time, means that nothing stalls.
>>
>> Also, if the host disables preemption or otherwise boosts its priority
>> while version is odd, then the actual stall will be very short, in
>> contrast to an IPI-induced stall, which will be much, much longer.
>>
>> --Andy
>
> 1) The updates are rare.
> 2) There are no user complaints about the IPI mechanism.

If KVM ever starts directly propagating corrected time
(CLOCK_MONOTONIC, for example), then the updates won't be rare.

Maybe I'll try to instrument this.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-21 Thread Andy Lutomirski
On Fri, Dec 18, 2015 at 1:49 PM, Marcelo Tosatti  wrote:
> On Fri, Dec 18, 2015 at 12:25:11PM -0800, Andy Lutomirski wrote:
>> [cc: John Stultz -- maybe you have ideas on how this should best
>> integrate with the core code]
>>
>> On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti  
>> wrote:

>> > Can you write an actual proposal (with details) that accomodates the
>> > issue described at "Assuming a stable TSC across physical CPUS, and a
>> > stable TSC" ?
>> >
>> > Yes it would be nicer, the IPIs (to stop the vcpus) are problematic for
>> > realtime guests.
>>
>> This shouldn't require many details, and I don't think there's an ABI
>> change.  The rules are:
>>
>> When the overall system timebase changes (e.g. when the selected
>> clocksource changes or when update_pvclock_gtod is called), the KVM
>> host would:
>>
>> optionally: preempt_disable();  /* for performance */
>>
>> for all vms {
>>
>>   for all registered pvti structures {
>> pvti->version++;  /* should be odd now */
>>   }
>
> pvti is userspace data, so you have to pin it before?

Yes.

Fortunately, most systems probably only have one page of pvti
structures, I think (unless there are a ton of vcpus), so the
performance impact should be negligible.

>
>>   /* Note: right now, any vcpu that tries to access pvti will start
>> infinite looping.  We should add cpu_relax() to the guests. */
>>
>>   for all registered pvti structures {
>> update everything except pvti->version;
>>   }
>>
>>   for all registered pvti structures {
>> pvti->version++;  /* should be even now */
>>   }
>>
>>   cond_resched();
>> }
>>
>> Is this enough detail?  This should work with all existing guests,
>> too, unless there's a buggy guest out there that actually fails to
>> double-check version.
>
> What is the advantage of this over the brute force method, given
> that guests will busy spin?
>
> (busy spin is equally problematic as IPI for realtime guests).

I disagree.  It's never been safe to call clock_gettime from an RT
task and expect a guarantee of real-time performance.  We could fix
that, but it's not even safe on non-KVM.

Sending an IPI *always* stalls the task.  Taking a lock (which is
effectively what this is doing) only stalls the tasks that contend for
the lock, which, most of the time, means that nothing stalls.

Also, if the host disables preemption or otherwise boosts its priority
while version is odd, then the actual stall will be very short, in
contrast to an IPI-induced stall, which will be much, much longer.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-21 Thread Marcelo Tosatti
On Fri, Dec 18, 2015 at 12:25:11PM -0800, Andy Lutomirski wrote:
> [cc: John Stultz -- maybe you have ideas on how this should best
> integrate with the core code]
> 
> On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti  wrote:
> > On Fri, Dec 18, 2015 at 11:27:13AM -0800, Andy Lutomirski wrote:
> >> On Fri, Dec 18, 2015 at 3:47 AM, Marcelo Tosatti  
> >> wrote:
> >> > On Thu, Dec 17, 2015 at 05:12:59PM -0800, Andy Lutomirski wrote:
> >> >> On Thu, Dec 17, 2015 at 11:08 AM, Marcelo Tosatti  
> >> >> wrote:
> >> >> > On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
> >> >> >> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti 
> >> >> >>  wrote:
> >> >> >> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
> >> >> >> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski 
> >> >> >> >>  wrote:
> >> >> >> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini 
> >> >> >> >> >  wrote:
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
> >> >> >> >> >>> > RAW TSC NTP corrected TSC
> >> >> >> >> >>> > t0  10  10
> >> >> >> >> >>> > t1  20  19.99
> >> >> >> >> >>> > t2  30  29.98
> >> >> >> >> >>> > t3  40  39.97
> >> >> >> >> >>> > t4  50  49.96
> >> >> >
> >> >> > (1)
> >> >> >
> >> >> >> >> >>> >
> >> >> >> >> >>> > ...
> >> >> >> >> >>> >
> >> >> >> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> >> >> >> >> >>> > you can see what will happen.
> >> >> >> >> >>>
> >> >> >> >> >>> Sure, but why would you ever switch from one to the other?
> >> >> >> >> >>
> >> >> >> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  
> >> >> >> >> >> After
> >> >> >> >> >> resume, the TSC certainly increases at the same rate as 
> >> >> >> >> >> before, but the
> >> >> >> >> >> raw TSC restarted counting from 0 and systemtime has increased 
> >> >> >> >> >> slower
> >> >> >> >> >> than the guest kvmclock.
> >> >> >> >> >
> >> >> >> >> > Wait, are we talking about the host's NTP or the guest's NTP?
> >> >> >> >> >
> >> >> >> >> > If it's the host's, then wouldn't systemtime be reset after 
> >> >> >> >> > resume to
> >> >> >> >> > the NTP corrected value?  If so, the guest wouldn't see time go
> >> >> >> >> > backwards.
> >> >> >> >> >
> >> >> >> >> > If it's the guest's, then the guest's NTP correction is applied 
> >> >> >> >> > on top
> >> >> >> >> > of kvmclock, and this shouldn't matter.
> >> >> >> >> >
> >> >> >> >> > I still feel like I'm missing something very basic here.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> OK, I think I get it.
> >> >> >> >>
> >> >> >> >> Marcelo, I thought that kvmclock was supposed to propagate the 
> >> >> >> >> host's
> >> >> >> >> correction to the guest.  If it did, indeed, propagate the 
> >> >> >> >> correction
> >> >> >> >> then, after resume, the host's new system_time would match the 
> >> >> >> >> guest's
> >> >> >> >> idea of it (after accounting for the guest's long nap), and I 
> >> >> >> >> don't
> >> >> >> >> think there would be a problem.
> >> >> >> >> That being said, I can't find the code in the masterclock stuff 
> >> >> >> >> that
> >> >> >> >> would actually do this.
> >> >> >> >
> >> >> >> > Guest clock is maintained by guest timekeeping code, which does:
> >> >> >> >
> >> >> >> > timer_interrupt()
> >> >> >> > offset = read clocksource since last timer interrupt
> >> >> >> > accumulate_to_systemclock(offset)
> >> >> >> >
> >> >> >> > The frequency correction of NTP in the host can be applied to
> >> >> >> > kvmclock, which will be visible to the guest
> >> >> >> > at "read clocksource since last timer interrupt"
> >> >> >> > (kvmclock_clocksource_read function).
> >> >> >>
> >> >> >> pvclock_clocksource_read?  That seems to do the same thing as all the
> >> >> >> other clocksource access functions.
> >> >> >>
> >> >> >> >
> >> >> >> > This does not mean that the NTP correction in the host is 
> >> >> >> > propagated
> >> >> >> > to the guests system clock directly.
> >> >> >> >
> >> >> >> > (For example, the guest can run NTP which is free to do further
> >> >> >> > adjustments at "accumulate_to_systemclock(offset)" time).
> >> >> >>
> >> >> >> Of course.  But I expected that, in the absence of NTP on the guest,
> >> >> >> that the guest would track the host's *corrected* time.
> >> >> >>
> >> >> >> >
> >> >> >> >> If, on the other hand, the host's NTP correction is not supposed 
> >> >> >> >> to
> >> >> >> >> propagate to the guest,
> >> >> >> >
> >> >> >> > This is optional. There is a module option to control this, in 
> >> >> >> > fact.
> >> >> >> >
> >> >> >> > Its nice to have, because then you can execute a guest without NTP
> >> >> >> > (say without network connection), and 

Re: kvmclock doesn't work, help?

2015-12-18 Thread Andy Lutomirski
[cc: John Stultz -- maybe you have ideas on how this should best
integrate with the core code]

On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti  wrote:
> On Fri, Dec 18, 2015 at 11:27:13AM -0800, Andy Lutomirski wrote:
>> On Fri, Dec 18, 2015 at 3:47 AM, Marcelo Tosatti  wrote:
>> > On Thu, Dec 17, 2015 at 05:12:59PM -0800, Andy Lutomirski wrote:
>> >> On Thu, Dec 17, 2015 at 11:08 AM, Marcelo Tosatti  
>> >> wrote:
>> >> > On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
>> >> >> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti  
>> >> >> wrote:
>> >> >> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
>> >> >> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski 
>> >> >> >>  wrote:
>> >> >> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini 
>> >> >> >> >  wrote:
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
>> >> >> >> >>> > RAW TSC NTP corrected TSC
>> >> >> >> >>> > t0  10  10
>> >> >> >> >>> > t1  20  19.99
>> >> >> >> >>> > t2  30  29.98
>> >> >> >> >>> > t3  40  39.97
>> >> >> >> >>> > t4  50  49.96
>> >> >
>> >> > (1)
>> >> >
>> >> >> >> >>> >
>> >> >> >> >>> > ...
>> >> >> >> >>> >
>> >> >> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
>> >> >> >> >>> > you can see what will happen.
>> >> >> >> >>>
>> >> >> >> >>> Sure, but why would you ever switch from one to the other?
>> >> >> >> >>
>> >> >> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  
>> >> >> >> >> After
>> >> >> >> >> resume, the TSC certainly increases at the same rate as before, 
>> >> >> >> >> but the
>> >> >> >> >> raw TSC restarted counting from 0 and systemtime has increased 
>> >> >> >> >> slower
>> >> >> >> >> than the guest kvmclock.
>> >> >> >> >
>> >> >> >> > Wait, are we talking about the host's NTP or the guest's NTP?
>> >> >> >> >
>> >> >> >> > If it's the host's, then wouldn't systemtime be reset after 
>> >> >> >> > resume to
>> >> >> >> > the NTP corrected value?  If so, the guest wouldn't see time go
>> >> >> >> > backwards.
>> >> >> >> >
>> >> >> >> > If it's the guest's, then the guest's NTP correction is applied 
>> >> >> >> > on top
>> >> >> >> > of kvmclock, and this shouldn't matter.
>> >> >> >> >
>> >> >> >> > I still feel like I'm missing something very basic here.
>> >> >> >> >
>> >> >> >>
>> >> >> >> OK, I think I get it.
>> >> >> >>
>> >> >> >> Marcelo, I thought that kvmclock was supposed to propagate the 
>> >> >> >> host's
>> >> >> >> correction to the guest.  If it did, indeed, propagate the 
>> >> >> >> correction
>> >> >> >> then, after resume, the host's new system_time would match the 
>> >> >> >> guest's
>> >> >> >> idea of it (after accounting for the guest's long nap), and I don't
>> >> >> >> think there would be a problem.
>> >> >> >> That being said, I can't find the code in the masterclock stuff that
>> >> >> >> would actually do this.
>> >> >> >
>> >> >> > Guest clock is maintained by guest timekeeping code, which does:
>> >> >> >
>> >> >> > timer_interrupt()
>> >> >> > offset = read clocksource since last timer interrupt
>> >> >> > accumulate_to_systemclock(offset)
>> >> >> >
>> >> >> > The frequency correction of NTP in the host can be applied to
>> >> >> > kvmclock, which will be visible to the guest
>> >> >> > at "read clocksource since last timer interrupt"
>> >> >> > (kvmclock_clocksource_read function).
>> >> >>
>> >> >> pvclock_clocksource_read?  That seems to do the same thing as all the
>> >> >> other clocksource access functions.
>> >> >>
>> >> >> >
>> >> >> > This does not mean that the NTP correction in the host is propagated
>> >> >> > to the guests system clock directly.
>> >> >> >
>> >> >> > (For example, the guest can run NTP which is free to do further
>> >> >> > adjustments at "accumulate_to_systemclock(offset)" time).
>> >> >>
>> >> >> Of course.  But I expected that, in the absence of NTP on the guest,
>> >> >> that the guest would track the host's *corrected* time.
>> >> >>
>> >> >> >
>> >> >> >> If, on the other hand, the host's NTP correction is not supposed to
>> >> >> >> propagate to the guest,
>> >> >> >
>> >> >> > This is optional. There is a module option to control this, in fact.
>> >> >> >
>> >> >> > Its nice to have, because then you can execute a guest without NTP
>> >> >> > (say without network connection), and have a kvmclock (kvmclock is a
>> >> >> > clocksource, not a guest system clock) which is NTP corrected.
>> >> >>
>> >> >> Can you point to how this works?  I found kvm_guest_time_update, whch
>> >> >> is called under circumstances that I haven't untangled.  I can't
>> >> >> really tell what it's trying to do.
>> >> >
>> >> > 

Re: kvmclock doesn't work, help?

2015-12-18 Thread Marcelo Tosatti
On Fri, Dec 18, 2015 at 11:27:13AM -0800, Andy Lutomirski wrote:
> On Fri, Dec 18, 2015 at 3:47 AM, Marcelo Tosatti  wrote:
> > On Thu, Dec 17, 2015 at 05:12:59PM -0800, Andy Lutomirski wrote:
> >> On Thu, Dec 17, 2015 at 11:08 AM, Marcelo Tosatti  
> >> wrote:
> >> > On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
> >> >> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti  
> >> >> wrote:
> >> >> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
> >> >> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski 
> >> >> >>  wrote:
> >> >> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini 
> >> >> >> >  wrote:
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
> >> >> >> >>> > RAW TSC NTP corrected TSC
> >> >> >> >>> > t0  10  10
> >> >> >> >>> > t1  20  19.99
> >> >> >> >>> > t2  30  29.98
> >> >> >> >>> > t3  40  39.97
> >> >> >> >>> > t4  50  49.96
> >> >
> >> > (1)
> >> >
> >> >> >> >>> >
> >> >> >> >>> > ...
> >> >> >> >>> >
> >> >> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> >> >> >> >>> > you can see what will happen.
> >> >> >> >>>
> >> >> >> >>> Sure, but why would you ever switch from one to the other?
> >> >> >> >>
> >> >> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  
> >> >> >> >> After
> >> >> >> >> resume, the TSC certainly increases at the same rate as before, 
> >> >> >> >> but the
> >> >> >> >> raw TSC restarted counting from 0 and systemtime has increased 
> >> >> >> >> slower
> >> >> >> >> than the guest kvmclock.
> >> >> >> >
> >> >> >> > Wait, are we talking about the host's NTP or the guest's NTP?
> >> >> >> >
> >> >> >> > If it's the host's, then wouldn't systemtime be reset after resume 
> >> >> >> > to
> >> >> >> > the NTP corrected value?  If so, the guest wouldn't see time go
> >> >> >> > backwards.
> >> >> >> >
> >> >> >> > If it's the guest's, then the guest's NTP correction is applied on 
> >> >> >> > top
> >> >> >> > of kvmclock, and this shouldn't matter.
> >> >> >> >
> >> >> >> > I still feel like I'm missing something very basic here.
> >> >> >> >
> >> >> >>
> >> >> >> OK, I think I get it.
> >> >> >>
> >> >> >> Marcelo, I thought that kvmclock was supposed to propagate the host's
> >> >> >> correction to the guest.  If it did, indeed, propagate the correction
> >> >> >> then, after resume, the host's new system_time would match the 
> >> >> >> guest's
> >> >> >> idea of it (after accounting for the guest's long nap), and I don't
> >> >> >> think there would be a problem.
> >> >> >> That being said, I can't find the code in the masterclock stuff that
> >> >> >> would actually do this.
> >> >> >
> >> >> > Guest clock is maintained by guest timekeeping code, which does:
> >> >> >
> >> >> > timer_interrupt()
> >> >> > offset = read clocksource since last timer interrupt
> >> >> > accumulate_to_systemclock(offset)
> >> >> >
> >> >> > The frequency correction of NTP in the host can be applied to
> >> >> > kvmclock, which will be visible to the guest
> >> >> > at "read clocksource since last timer interrupt"
> >> >> > (kvmclock_clocksource_read function).
> >> >>
> >> >> pvclock_clocksource_read?  That seems to do the same thing as all the
> >> >> other clocksource access functions.
> >> >>
> >> >> >
> >> >> > This does not mean that the NTP correction in the host is propagated
> >> >> > to the guests system clock directly.
> >> >> >
> >> >> > (For example, the guest can run NTP which is free to do further
> >> >> > adjustments at "accumulate_to_systemclock(offset)" time).
> >> >>
> >> >> Of course.  But I expected that, in the absence of NTP on the guest,
> >> >> that the guest would track the host's *corrected* time.
> >> >>
> >> >> >
> >> >> >> If, on the other hand, the host's NTP correction is not supposed to
> >> >> >> propagate to the guest,
> >> >> >
> >> >> > This is optional. There is a module option to control this, in fact.
> >> >> >
> >> >> > Its nice to have, because then you can execute a guest without NTP
> >> >> > (say without network connection), and have a kvmclock (kvmclock is a
> >> >> > clocksource, not a guest system clock) which is NTP corrected.
> >> >>
> >> >> Can you point to how this works?  I found kvm_guest_time_update, whch
> >> >> is called under circumstances that I haven't untangled.  I can't
> >> >> really tell what it's trying to do.
> >> >
> >> > Documentation/virtual/kvm/timekeeping.txt.
> >> >
> >>
> >> That document is really long.  I skimmed it and found nothing.
> >
> > kvm_guest_time_update is called when KVM_REQ_UPDATE_CLOCK is set.
> >
> > This happens when:
> > - kvmclock is enabled or disabled by the guest.
> > - periodically to propagate NTP 

Re: kvmclock doesn't work, help?

2015-12-18 Thread Andy Lutomirski
On Fri, Dec 18, 2015 at 3:47 AM, Marcelo Tosatti  wrote:
> On Thu, Dec 17, 2015 at 05:12:59PM -0800, Andy Lutomirski wrote:
>> On Thu, Dec 17, 2015 at 11:08 AM, Marcelo Tosatti  
>> wrote:
>> > On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
>> >> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti  
>> >> wrote:
>> >> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
>> >> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  
>> >> >> wrote:
>> >> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini 
>> >> >> >  wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
>> >> >> >>> > RAW TSC NTP corrected TSC
>> >> >> >>> > t0  10  10
>> >> >> >>> > t1  20  19.99
>> >> >> >>> > t2  30  29.98
>> >> >> >>> > t3  40  39.97
>> >> >> >>> > t4  50  49.96
>> >
>> > (1)
>> >
>> >> >> >>> >
>> >> >> >>> > ...
>> >> >> >>> >
>> >> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
>> >> >> >>> > you can see what will happen.
>> >> >> >>>
>> >> >> >>> Sure, but why would you ever switch from one to the other?
>> >> >> >>
>> >> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
>> >> >> >> resume, the TSC certainly increases at the same rate as before, but 
>> >> >> >> the
>> >> >> >> raw TSC restarted counting from 0 and systemtime has increased 
>> >> >> >> slower
>> >> >> >> than the guest kvmclock.
>> >> >> >
>> >> >> > Wait, are we talking about the host's NTP or the guest's NTP?
>> >> >> >
>> >> >> > If it's the host's, then wouldn't systemtime be reset after resume to
>> >> >> > the NTP corrected value?  If so, the guest wouldn't see time go
>> >> >> > backwards.
>> >> >> >
>> >> >> > If it's the guest's, then the guest's NTP correction is applied on 
>> >> >> > top
>> >> >> > of kvmclock, and this shouldn't matter.
>> >> >> >
>> >> >> > I still feel like I'm missing something very basic here.
>> >> >> >
>> >> >>
>> >> >> OK, I think I get it.
>> >> >>
>> >> >> Marcelo, I thought that kvmclock was supposed to propagate the host's
>> >> >> correction to the guest.  If it did, indeed, propagate the correction
>> >> >> then, after resume, the host's new system_time would match the guest's
>> >> >> idea of it (after accounting for the guest's long nap), and I don't
>> >> >> think there would be a problem.
>> >> >> That being said, I can't find the code in the masterclock stuff that
>> >> >> would actually do this.
>> >> >
>> >> > Guest clock is maintained by guest timekeeping code, which does:
>> >> >
>> >> > timer_interrupt()
>> >> > offset = read clocksource since last timer interrupt
>> >> > accumulate_to_systemclock(offset)
>> >> >
>> >> > The frequency correction of NTP in the host can be applied to
>> >> > kvmclock, which will be visible to the guest
>> >> > at "read clocksource since last timer interrupt"
>> >> > (kvmclock_clocksource_read function).
>> >>
>> >> pvclock_clocksource_read?  That seems to do the same thing as all the
>> >> other clocksource access functions.
>> >>
>> >> >
>> >> > This does not mean that the NTP correction in the host is propagated
>> >> > to the guests system clock directly.
>> >> >
>> >> > (For example, the guest can run NTP which is free to do further
>> >> > adjustments at "accumulate_to_systemclock(offset)" time).
>> >>
>> >> Of course.  But I expected that, in the absence of NTP on the guest,
>> >> that the guest would track the host's *corrected* time.
>> >>
>> >> >
>> >> >> If, on the other hand, the host's NTP correction is not supposed to
>> >> >> propagate to the guest,
>> >> >
>> >> > This is optional. There is a module option to control this, in fact.
>> >> >
>> >> > Its nice to have, because then you can execute a guest without NTP
>> >> > (say without network connection), and have a kvmclock (kvmclock is a
>> >> > clocksource, not a guest system clock) which is NTP corrected.
>> >>
>> >> Can you point to how this works?  I found kvm_guest_time_update, whch
>> >> is called under circumstances that I haven't untangled.  I can't
>> >> really tell what it's trying to do.
>> >
>> > Documentation/virtual/kvm/timekeeping.txt.
>> >
>>
>> That document is really long.  I skimmed it and found nothing.
>
> kvm_guest_time_update is called when KVM_REQ_UPDATE_CLOCK is set.
>
> This happens when:
> - kvmclock is enabled or disabled by the guest.
> - periodically to propagate NTP correction to kvmclock clock.
> - guest vcpu switching between host pcpus when TSCs are out of sync.
> - after migration.
> - after savevm/loadvm.
>
>> >> In any case, this still seems much more convoluted than it has to be.
>> >> In the case in which the host has a stable TSC (tsc is selected in the
>> 

Re: kvmclock doesn't work, help?

2015-12-18 Thread Marcelo Tosatti
On Thu, Dec 17, 2015 at 05:12:59PM -0800, Andy Lutomirski wrote:
> On Thu, Dec 17, 2015 at 11:08 AM, Marcelo Tosatti  wrote:
> > On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
> >> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti  
> >> wrote:
> >> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
> >> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  
> >> >> wrote:
> >> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  
> >> >> > wrote:
> >> >> >>
> >> >> >>
> >> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
> >> >> >>> > RAW TSC NTP corrected TSC
> >> >> >>> > t0  10  10
> >> >> >>> > t1  20  19.99
> >> >> >>> > t2  30  29.98
> >> >> >>> > t3  40  39.97
> >> >> >>> > t4  50  49.96
> >
> > (1)
> >
> >> >> >>> >
> >> >> >>> > ...
> >> >> >>> >
> >> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> >> >> >>> > you can see what will happen.
> >> >> >>>
> >> >> >>> Sure, but why would you ever switch from one to the other?
> >> >> >>
> >> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
> >> >> >> resume, the TSC certainly increases at the same rate as before, but 
> >> >> >> the
> >> >> >> raw TSC restarted counting from 0 and systemtime has increased slower
> >> >> >> than the guest kvmclock.
> >> >> >
> >> >> > Wait, are we talking about the host's NTP or the guest's NTP?
> >> >> >
> >> >> > If it's the host's, then wouldn't systemtime be reset after resume to
> >> >> > the NTP corrected value?  If so, the guest wouldn't see time go
> >> >> > backwards.
> >> >> >
> >> >> > If it's the guest's, then the guest's NTP correction is applied on top
> >> >> > of kvmclock, and this shouldn't matter.
> >> >> >
> >> >> > I still feel like I'm missing something very basic here.
> >> >> >
> >> >>
> >> >> OK, I think I get it.
> >> >>
> >> >> Marcelo, I thought that kvmclock was supposed to propagate the host's
> >> >> correction to the guest.  If it did, indeed, propagate the correction
> >> >> then, after resume, the host's new system_time would match the guest's
> >> >> idea of it (after accounting for the guest's long nap), and I don't
> >> >> think there would be a problem.
> >> >> That being said, I can't find the code in the masterclock stuff that
> >> >> would actually do this.
> >> >
> >> > Guest clock is maintained by guest timekeeping code, which does:
> >> >
> >> > timer_interrupt()
> >> > offset = read clocksource since last timer interrupt
> >> > accumulate_to_systemclock(offset)
> >> >
> >> > The frequency correction of NTP in the host can be applied to
> >> > kvmclock, which will be visible to the guest
> >> > at "read clocksource since last timer interrupt"
> >> > (kvmclock_clocksource_read function).
> >>
> >> pvclock_clocksource_read?  That seems to do the same thing as all the
> >> other clocksource access functions.
> >>
> >> >
> >> > This does not mean that the NTP correction in the host is propagated
> >> > to the guests system clock directly.
> >> >
> >> > (For example, the guest can run NTP which is free to do further
> >> > adjustments at "accumulate_to_systemclock(offset)" time).
> >>
> >> Of course.  But I expected that, in the absence of NTP on the guest,
> >> that the guest would track the host's *corrected* time.
> >>
> >> >
> >> >> If, on the other hand, the host's NTP correction is not supposed to
> >> >> propagate to the guest,
> >> >
> >> > This is optional. There is a module option to control this, in fact.
> >> >
> >> > Its nice to have, because then you can execute a guest without NTP
> >> > (say without network connection), and have a kvmclock (kvmclock is a
> >> > clocksource, not a guest system clock) which is NTP corrected.
> >>
> >> Can you point to how this works?  I found kvm_guest_time_update, whch
> >> is called under circumstances that I haven't untangled.  I can't
> >> really tell what it's trying to do.
> >
> > Documentation/virtual/kvm/timekeeping.txt.
> >
> 
> That document is really long.  I skimmed it and found nothing.

kvm_guest_time_update is called when KVM_REQ_UPDATE_CLOCK is set.

This happens when:
- kvmclock is enabled or disabled by the guest.
- periodically to propagate NTP correction to kvmclock clock.
- guest vcpu switching between host pcpus when TSCs are out of sync.
- after migration.
- after savevm/loadvm.

> >> In any case, this still seems much more convoluted than it has to be.
> >> In the case in which the host has a stable TSC (tsc is selected in the
> >> core timekeeping code, VCLOCK_TSC is set, etc), which is basically all
> >> the time on the last few generations of CPUs, then the core
> >> timekeeping code is already exposing a linear function that's supposed
> >> to be 

Re: kvmclock doesn't work, help?

2015-12-18 Thread John Stultz
On Fri, Dec 18, 2015 at 12:25 PM, Andy Lutomirski  wrote:
> [cc: John Stultz -- maybe you have ideas on how this should best
> integrate with the core code]
>
> On Fri, Dec 18, 2015 at 11:45 AM, Marcelo Tosatti  wrote:
>> On Fri, Dec 18, 2015 at 11:27:13AM -0800, Andy Lutomirski wrote:
>>> On Fri, Dec 18, 2015 at 3:47 AM, Marcelo Tosatti  
>>> wrote:
>>> > On Thu, Dec 17, 2015 at 05:12:59PM -0800, Andy Lutomirski wrote:
>>> >> On Thu, Dec 17, 2015 at 11:08 AM, Marcelo Tosatti  
>>> >> wrote:
>>> >> > On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
>>> >> >> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti 
>>> >> >>  wrote:
>>> >> >> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
>>> >> >> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski 
>>> >> >> >>  wrote:
>>> >> >> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini 
>>> >> >> >> >  wrote:
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
>>> >> >> >> >>> > RAW TSC NTP corrected TSC
>>> >> >> >> >>> > t0  10  10
>>> >> >> >> >>> > t1  20  19.99
>>> >> >> >> >>> > t2  30  29.98
>>> >> >> >> >>> > t3  40  39.97
>>> >> >> >> >>> > t4  50  49.96
>>> >> >
>>> >> > (1)
>>> >> >
>>> >> >> >> >>> >
>>> >> >> >> >>> > ...
>>> >> >> >> >>> >
>>> >> >> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
>>> >> >> >> >>> > you can see what will happen.
>>> >> >> >> >>>
>>> >> >> >> >>> Sure, but why would you ever switch from one to the other?
>>> >> >> >> >>
>>> >> >> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  
>>> >> >> >> >> After
>>> >> >> >> >> resume, the TSC certainly increases at the same rate as before, 
>>> >> >> >> >> but the
>>> >> >> >> >> raw TSC restarted counting from 0 and systemtime has increased 
>>> >> >> >> >> slower
>>> >> >> >> >> than the guest kvmclock.
>>> >> >> >> >
>>> >> >> >> > Wait, are we talking about the host's NTP or the guest's NTP?
>>> >> >> >> >
>>> >> >> >> > If it's the host's, then wouldn't systemtime be reset after 
>>> >> >> >> > resume to
>>> >> >> >> > the NTP corrected value?  If so, the guest wouldn't see time go
>>> >> >> >> > backwards.
>>> >> >> >> >
>>> >> >> >> > If it's the guest's, then the guest's NTP correction is applied 
>>> >> >> >> > on top
>>> >> >> >> > of kvmclock, and this shouldn't matter.
>>> >> >> >> >
>>> >> >> >> > I still feel like I'm missing something very basic here.
>>> >> >> >> >
>>> >> >> >>
>>> >> >> >> OK, I think I get it.
>>> >> >> >>
>>> >> >> >> Marcelo, I thought that kvmclock was supposed to propagate the 
>>> >> >> >> host's
>>> >> >> >> correction to the guest.  If it did, indeed, propagate the 
>>> >> >> >> correction
>>> >> >> >> then, after resume, the host's new system_time would match the 
>>> >> >> >> guest's
>>> >> >> >> idea of it (after accounting for the guest's long nap), and I don't
>>> >> >> >> think there would be a problem.
>>> >> >> >> That being said, I can't find the code in the masterclock stuff 
>>> >> >> >> that
>>> >> >> >> would actually do this.
>>> >> >> >
>>> >> >> > Guest clock is maintained by guest timekeeping code, which does:
>>> >> >> >
>>> >> >> > timer_interrupt()
>>> >> >> > offset = read clocksource since last timer interrupt
>>> >> >> > accumulate_to_systemclock(offset)
>>> >> >> >
>>> >> >> > The frequency correction of NTP in the host can be applied to
>>> >> >> > kvmclock, which will be visible to the guest
>>> >> >> > at "read clocksource since last timer interrupt"
>>> >> >> > (kvmclock_clocksource_read function).
>>> >> >>
>>> >> >> pvclock_clocksource_read?  That seems to do the same thing as all the
>>> >> >> other clocksource access functions.
>>> >> >>
>>> >> >> >
>>> >> >> > This does not mean that the NTP correction in the host is propagated
>>> >> >> > to the guests system clock directly.
>>> >> >> >
>>> >> >> > (For example, the guest can run NTP which is free to do further
>>> >> >> > adjustments at "accumulate_to_systemclock(offset)" time).
>>> >> >>
>>> >> >> Of course.  But I expected that, in the absence of NTP on the guest,
>>> >> >> that the guest would track the host's *corrected* time.
>>> >> >>
>>> >> >> >
>>> >> >> >> If, on the other hand, the host's NTP correction is not supposed to
>>> >> >> >> propagate to the guest,
>>> >> >> >
>>> >> >> > This is optional. There is a module option to control this, in fact.
>>> >> >> >
>>> >> >> > Its nice to have, because then you can execute a guest without NTP
>>> >> >> > (say without network connection), and have a kvmclock (kvmclock is a
>>> >> >> > clocksource, not a guest system clock) which is NTP corrected.
>>> >> >>
>>> >> >> Can you point to how 

Re: kvmclock doesn't work, help?

2015-12-17 Thread Marcelo Tosatti
On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  wrote:
> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  wrote:
> >>
> >>
> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
> >>> > RAW TSC NTP corrected TSC
> >>> > t0  10  10
> >>> > t1  20  19.99
> >>> > t2  30  29.98
> >>> > t3  40  39.97
> >>> > t4  50  49.96
> >>> >
> >>> > ...
> >>> >
> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> >>> > you can see what will happen.
> >>>
> >>> Sure, but why would you ever switch from one to the other?
> >>
> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
> >> resume, the TSC certainly increases at the same rate as before, but the
> >> raw TSC restarted counting from 0 and systemtime has increased slower
> >> than the guest kvmclock.
> >
> > Wait, are we talking about the host's NTP or the guest's NTP?
> >
> > If it's the host's, then wouldn't systemtime be reset after resume to
> > the NTP corrected value?  If so, the guest wouldn't see time go
> > backwards.
> >
> > If it's the guest's, then the guest's NTP correction is applied on top
> > of kvmclock, and this shouldn't matter.
> >
> > I still feel like I'm missing something very basic here.
> >
> 
> OK, I think I get it.
> 
> Marcelo, I thought that kvmclock was supposed to propagate the host's
> correction to the guest.  If it did, indeed, propagate the correction
> then, after resume, the host's new system_time would match the guest's
> idea of it (after accounting for the guest's long nap), and I don't
> think there would be a problem.
> That being said, I can't find the code in the masterclock stuff that
> would actually do this.

Guest clock is maintained by guest timekeeping code, which does:

timer_interrupt() 
offset = read clocksource since last timer interrupt
accumulate_to_systemclock(offset)

The frequency correction of NTP in the host can be applied to 
kvmclock, which will be visible to the guest 
at "read clocksource since last timer interrupt" 
(kvmclock_clocksource_read function).

This does not mean that the NTP correction in the host is propagated
to the guests system clock directly.

(For example, the guest can run NTP which is free to do further
adjustments at "accumulate_to_systemclock(offset)" time).

> If, on the other hand, the host's NTP correction is not supposed to
> propagate to the guest, 

This is optional. There is a module option to control this, in fact.

Its nice to have, because then you can execute a guest without NTP
(say without network connection), and have a kvmclock (kvmclock is a
clocksource, not a guest system clock) which is NTP corrected.

> then shouldn't KVM just update system_time on
> resume to whatever the guest would think it had (which I think would
> be equivalent to the host's CLOCK_MONOTONIC_RAW value, possibly
> shifted by some per-guest constant offset).
> 
> --Andy

Sure, you could add a correction to compensate and make sure 
the guest clock does not see time backwards.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-17 Thread Andy Lutomirski
On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti  wrote:
> On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
>> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  wrote:
>> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  
>> > wrote:
>> >>
>> >>
>> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
>> >>> > RAW TSC NTP corrected TSC
>> >>> > t0  10  10
>> >>> > t1  20  19.99
>> >>> > t2  30  29.98
>> >>> > t3  40  39.97
>> >>> > t4  50  49.96
>> >>> >
>> >>> > ...
>> >>> >
>> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
>> >>> > you can see what will happen.
>> >>>
>> >>> Sure, but why would you ever switch from one to the other?
>> >>
>> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
>> >> resume, the TSC certainly increases at the same rate as before, but the
>> >> raw TSC restarted counting from 0 and systemtime has increased slower
>> >> than the guest kvmclock.
>> >
>> > Wait, are we talking about the host's NTP or the guest's NTP?
>> >
>> > If it's the host's, then wouldn't systemtime be reset after resume to
>> > the NTP corrected value?  If so, the guest wouldn't see time go
>> > backwards.
>> >
>> > If it's the guest's, then the guest's NTP correction is applied on top
>> > of kvmclock, and this shouldn't matter.
>> >
>> > I still feel like I'm missing something very basic here.
>> >
>>
>> OK, I think I get it.
>>
>> Marcelo, I thought that kvmclock was supposed to propagate the host's
>> correction to the guest.  If it did, indeed, propagate the correction
>> then, after resume, the host's new system_time would match the guest's
>> idea of it (after accounting for the guest's long nap), and I don't
>> think there would be a problem.
>> That being said, I can't find the code in the masterclock stuff that
>> would actually do this.
>
> Guest clock is maintained by guest timekeeping code, which does:
>
> timer_interrupt()
> offset = read clocksource since last timer interrupt
> accumulate_to_systemclock(offset)
>
> The frequency correction of NTP in the host can be applied to
> kvmclock, which will be visible to the guest
> at "read clocksource since last timer interrupt"
> (kvmclock_clocksource_read function).

pvclock_clocksource_read?  That seems to do the same thing as all the
other clocksource access functions.

>
> This does not mean that the NTP correction in the host is propagated
> to the guests system clock directly.
>
> (For example, the guest can run NTP which is free to do further
> adjustments at "accumulate_to_systemclock(offset)" time).

Of course.  But I expected that, in the absence of NTP on the guest,
that the guest would track the host's *corrected* time.

>
>> If, on the other hand, the host's NTP correction is not supposed to
>> propagate to the guest,
>
> This is optional. There is a module option to control this, in fact.
>
> Its nice to have, because then you can execute a guest without NTP
> (say without network connection), and have a kvmclock (kvmclock is a
> clocksource, not a guest system clock) which is NTP corrected.

Can you point to how this works?  I found kvm_guest_time_update, whch
is called under circumstances that I haven't untangled.  I can't
really tell what it's trying to do.

In any case, this still seems much more convoluted than it has to be.
In the case in which the host has a stable TSC (tsc is selected in the
core timekeeping code, VCLOCK_TSC is set, etc), which is basically all
the time on the last few generations of CPUs, then the core
timekeeping code is already exposing a linear function that's supposed
to be used for monotonic, cpu-local access to a corrected nanosecond
counter.  It's even in pretty much exactly the right form to pass
through to the guest via pvclock in the gtod data.  Why doesn't KVM
pass it through verbatim, updated in real time?  Is there some legacy
reason that KVM must apply its own corrections and has to jump through
hoops to pause vcpus when updating those vcpu's copies of the pvclock
data?

>
>> then shouldn't KVM just update system_time on
>> resume to whatever the guest would think it had (which I think would
>> be equivalent to the host's CLOCK_MONOTONIC_RAW value, possibly
>> shifted by some per-guest constant offset).
>>
>> --Andy
>
> Sure, you could add a correction to compensate and make sure
> the guest clock does not see time backwards.
>

Could you help do that?  You understand the code far better than I do.

As it stands, it simply doesn't work on any system that suspends and
resumes (unless maybe the system has the upcoming Intel ART feature,
and I have no clue when that'll show up).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: kvmclock doesn't work, help?

2015-12-17 Thread Marcelo Tosatti
GOn Mon, Dec 14, 2015 at 02:31:10PM -0800, Andy Lutomirski wrote:
> On Mon, Dec 14, 2015 at 2:00 PM, Marcelo Tosatti  wrote:
> > On Mon, Dec 14, 2015 at 02:44:15PM +0100, Paolo Bonzini wrote:
> >>
> >>
> >> On 11/12/2015 22:57, Andy Lutomirski wrote:
> >> > I'm still not seeing the issue.
> >> >
> >> > The formula is:
> >> >
> >> > (((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >>
> >> > pvti->tsc_shift) + pvti->system_time
> >> >
> >> > Obviously, if you reset pvti->tsc_timestamp to the current tsc value
> >> > after suspend/resume, you would also need to update system_time.
> >> >
> >> > I don't see what this has to do with suspend/resume or with whether
> >> > the effective scale factor is greater than or less than one.  The only
> >> > suspend/resume interaction I can see is that, if the host allows the
> >> > guest-observed TSC value to jump (which is arguably a bug, what that's
> >> > not important here), it needs to update pvti before resuming the
> >> > guest.
> >>
> >> Which is not an issue, since freezing obviously gets all CPUs out of
> >> guest mode.
> >>
> >> Marcelo, can you provide an example with made-up values for tsc and pvti?
> >
> > I meant "systemtime" at ^.
> >
> > guest visible clock = systemtime (updated at time 0, guest initialization) 
> > + scaled tsc reads=LARGE VALUE.
> >   ^^
> > guest reads clock to memory at location A = scaled tsc read.
> > -> suspend resume event
> > guest visible clock = systemtime (updated at time AFTER SUSPEND) + scaled 
> > tsc reads=0.
> > guest reads clock to memory at location B.
> >
> > So before the suspend/resume event, the clock is the RAW TSC values
> > (scaled by kvmclock, but the frequency of the RAW TSC).
> >
> > After suspend/resume event, the clock is updated from the host
> > via get_kernel_ns(), which reads the corrected NTP frequency TSC.
> >
> > So you switch the timebase, from a clock running at a given frequency,
> > to a clock running at another frequency (effective frequency).
> >
> > Example:
> >
> > RAW TSC NTP corrected TSC
> > t0  10  10
> > t1  20  19.99
> > t2  30  29.98
> > t3  40  39.97
> > t4  50  49.96
> >
> > ...
> >
> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> > you can see what will happen.
> 
> Sure, but why would you ever switch from one to the other? 

Because thats what happens when you ask kvmclock to update from system
time (which is a reliable clock, resistant to suspend/resume issues).

>  I'm still not seeing the scenario under which this discontinuity is
> visible to anything other than the kvmclock code itself.

Host userspace can see if it uses TSC and clock_gettime()
and expects them to run hand in hand.

> The only things that need to be monotonic are the output from
> vread_pvclock and the in-kernel equivalent, I think.
> 
> --Andy

clock_gettime as well, should be monotonic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-17 Thread Marcelo Tosatti
On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti  wrote:
> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  
> >> wrote:
> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  
> >> > wrote:
> >> >>
> >> >>
> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
> >> >>> > RAW TSC NTP corrected TSC
> >> >>> > t0  10  10
> >> >>> > t1  20  19.99
> >> >>> > t2  30  29.98
> >> >>> > t3  40  39.97
> >> >>> > t4  50  49.96

(1)

> >> >>> >
> >> >>> > ...
> >> >>> >
> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> >> >>> > you can see what will happen.
> >> >>>
> >> >>> Sure, but why would you ever switch from one to the other?
> >> >>
> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
> >> >> resume, the TSC certainly increases at the same rate as before, but the
> >> >> raw TSC restarted counting from 0 and systemtime has increased slower
> >> >> than the guest kvmclock.
> >> >
> >> > Wait, are we talking about the host's NTP or the guest's NTP?
> >> >
> >> > If it's the host's, then wouldn't systemtime be reset after resume to
> >> > the NTP corrected value?  If so, the guest wouldn't see time go
> >> > backwards.
> >> >
> >> > If it's the guest's, then the guest's NTP correction is applied on top
> >> > of kvmclock, and this shouldn't matter.
> >> >
> >> > I still feel like I'm missing something very basic here.
> >> >
> >>
> >> OK, I think I get it.
> >>
> >> Marcelo, I thought that kvmclock was supposed to propagate the host's
> >> correction to the guest.  If it did, indeed, propagate the correction
> >> then, after resume, the host's new system_time would match the guest's
> >> idea of it (after accounting for the guest's long nap), and I don't
> >> think there would be a problem.
> >> That being said, I can't find the code in the masterclock stuff that
> >> would actually do this.
> >
> > Guest clock is maintained by guest timekeeping code, which does:
> >
> > timer_interrupt()
> > offset = read clocksource since last timer interrupt
> > accumulate_to_systemclock(offset)
> >
> > The frequency correction of NTP in the host can be applied to
> > kvmclock, which will be visible to the guest
> > at "read clocksource since last timer interrupt"
> > (kvmclock_clocksource_read function).
> 
> pvclock_clocksource_read?  That seems to do the same thing as all the
> other clocksource access functions.
> 
> >
> > This does not mean that the NTP correction in the host is propagated
> > to the guests system clock directly.
> >
> > (For example, the guest can run NTP which is free to do further
> > adjustments at "accumulate_to_systemclock(offset)" time).
> 
> Of course.  But I expected that, in the absence of NTP on the guest,
> that the guest would track the host's *corrected* time.
> 
> >
> >> If, on the other hand, the host's NTP correction is not supposed to
> >> propagate to the guest,
> >
> > This is optional. There is a module option to control this, in fact.
> >
> > Its nice to have, because then you can execute a guest without NTP
> > (say without network connection), and have a kvmclock (kvmclock is a
> > clocksource, not a guest system clock) which is NTP corrected.
> 
> Can you point to how this works?  I found kvm_guest_time_update, whch
> is called under circumstances that I haven't untangled.  I can't
> really tell what it's trying to do.

Documentation/virtual/kvm/timekeeping.txt.

> In any case, this still seems much more convoluted than it has to be.
> In the case in which the host has a stable TSC (tsc is selected in the
> core timekeeping code, VCLOCK_TSC is set, etc), which is basically all
> the time on the last few generations of CPUs, then the core
> timekeeping code is already exposing a linear function that's supposed
> to be used for monotonic, cpu-local access to a corrected nanosecond
> counter.  It's even in pretty much exactly the right form to pass
> through to the guest via pvclock in the gtod data.  Why doesn't KVM
> pass it through verbatim, updated in real time?  Is there some legacy
> reason that KVM must apply its own corrections and has to jump through
> hoops to pause vcpus when updating those vcpu's copies of the pvclock
> data?

Read the comment on x86.c which starts with
" *
 * Assuming a stable TSC across physical CPUS, and a stable TSC
 * across virtual CPUs, the following condition is possible.
 * Each numbered line represents an event visible to both
 * CPUs at the next numbered event.
"

> >> then shouldn't KVM just update system_time on
> >> resume to whatever the guest would think it had (which I think would
> >> be equivalent to the host's CLOCK_MONOTONIC_RAW value, 

Re: kvmclock doesn't work, help?

2015-12-17 Thread Andy Lutomirski
On Thu, Dec 17, 2015 at 11:08 AM, Marcelo Tosatti  wrote:
> On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
>> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti  wrote:
>> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
>> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  
>> >> wrote:
>> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
>> >> >>> > RAW TSC NTP corrected TSC
>> >> >>> > t0  10  10
>> >> >>> > t1  20  19.99
>> >> >>> > t2  30  29.98
>> >> >>> > t3  40  39.97
>> >> >>> > t4  50  49.96
>
> (1)
>
>> >> >>> >
>> >> >>> > ...
>> >> >>> >
>> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
>> >> >>> > you can see what will happen.
>> >> >>>
>> >> >>> Sure, but why would you ever switch from one to the other?
>> >> >>
>> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
>> >> >> resume, the TSC certainly increases at the same rate as before, but the
>> >> >> raw TSC restarted counting from 0 and systemtime has increased slower
>> >> >> than the guest kvmclock.
>> >> >
>> >> > Wait, are we talking about the host's NTP or the guest's NTP?
>> >> >
>> >> > If it's the host's, then wouldn't systemtime be reset after resume to
>> >> > the NTP corrected value?  If so, the guest wouldn't see time go
>> >> > backwards.
>> >> >
>> >> > If it's the guest's, then the guest's NTP correction is applied on top
>> >> > of kvmclock, and this shouldn't matter.
>> >> >
>> >> > I still feel like I'm missing something very basic here.
>> >> >
>> >>
>> >> OK, I think I get it.
>> >>
>> >> Marcelo, I thought that kvmclock was supposed to propagate the host's
>> >> correction to the guest.  If it did, indeed, propagate the correction
>> >> then, after resume, the host's new system_time would match the guest's
>> >> idea of it (after accounting for the guest's long nap), and I don't
>> >> think there would be a problem.
>> >> That being said, I can't find the code in the masterclock stuff that
>> >> would actually do this.
>> >
>> > Guest clock is maintained by guest timekeeping code, which does:
>> >
>> > timer_interrupt()
>> > offset = read clocksource since last timer interrupt
>> > accumulate_to_systemclock(offset)
>> >
>> > The frequency correction of NTP in the host can be applied to
>> > kvmclock, which will be visible to the guest
>> > at "read clocksource since last timer interrupt"
>> > (kvmclock_clocksource_read function).
>>
>> pvclock_clocksource_read?  That seems to do the same thing as all the
>> other clocksource access functions.
>>
>> >
>> > This does not mean that the NTP correction in the host is propagated
>> > to the guests system clock directly.
>> >
>> > (For example, the guest can run NTP which is free to do further
>> > adjustments at "accumulate_to_systemclock(offset)" time).
>>
>> Of course.  But I expected that, in the absence of NTP on the guest,
>> that the guest would track the host's *corrected* time.
>>
>> >
>> >> If, on the other hand, the host's NTP correction is not supposed to
>> >> propagate to the guest,
>> >
>> > This is optional. There is a module option to control this, in fact.
>> >
>> > Its nice to have, because then you can execute a guest without NTP
>> > (say without network connection), and have a kvmclock (kvmclock is a
>> > clocksource, not a guest system clock) which is NTP corrected.
>>
>> Can you point to how this works?  I found kvm_guest_time_update, whch
>> is called under circumstances that I haven't untangled.  I can't
>> really tell what it's trying to do.
>
> Documentation/virtual/kvm/timekeeping.txt.
>

That document is really long.  I skimmed it and found nothing.

>> In any case, this still seems much more convoluted than it has to be.
>> In the case in which the host has a stable TSC (tsc is selected in the
>> core timekeeping code, VCLOCK_TSC is set, etc), which is basically all
>> the time on the last few generations of CPUs, then the core
>> timekeeping code is already exposing a linear function that's supposed
>> to be used for monotonic, cpu-local access to a corrected nanosecond
>> counter.  It's even in pretty much exactly the right form to pass
>> through to the guest via pvclock in the gtod data.  Why doesn't KVM
>> pass it through verbatim, updated in real time?  Is there some legacy
>> reason that KVM must apply its own corrections and has to jump through
>> hoops to pause vcpus when updating those vcpu's copies of the pvclock
>> data?
>
> Read the comment on x86.c which starts with
> " *
>  * Assuming a stable TSC across physical CPUS, and a stable TSC
>  * across virtual CPUs, the following condition is possible.
>  * Each numbered line 

Re: kvmclock doesn't work, help?

2015-12-16 Thread Andy Lutomirski
On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  wrote:
>
>
> On 14/12/2015 23:31, Andy Lutomirski wrote:
>> > RAW TSC NTP corrected TSC
>> > t0  10  10
>> > t1  20  19.99
>> > t2  30  29.98
>> > t3  40  39.97
>> > t4  50  49.96
>> >
>> > ...
>> >
>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
>> > you can see what will happen.
>>
>> Sure, but why would you ever switch from one to the other?
>
> The guest uses the raw TSC and systemtime = 0 until suspend.  After
> resume, the TSC certainly increases at the same rate as before, but the
> raw TSC restarted counting from 0 and systemtime has increased slower
> than the guest kvmclock.

Wait, are we talking about the host's NTP or the guest's NTP?

If it's the host's, then wouldn't systemtime be reset after resume to
the NTP corrected value?  If so, the guest wouldn't see time go
backwards.

If it's the guest's, then the guest's NTP correction is applied on top
of kvmclock, and this shouldn't matter.

I still feel like I'm missing something very basic here.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-16 Thread Andy Lutomirski
On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  wrote:
> On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  wrote:
>>
>>
>> On 14/12/2015 23:31, Andy Lutomirski wrote:
>>> > RAW TSC NTP corrected TSC
>>> > t0  10  10
>>> > t1  20  19.99
>>> > t2  30  29.98
>>> > t3  40  39.97
>>> > t4  50  49.96
>>> >
>>> > ...
>>> >
>>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
>>> > you can see what will happen.
>>>
>>> Sure, but why would you ever switch from one to the other?
>>
>> The guest uses the raw TSC and systemtime = 0 until suspend.  After
>> resume, the TSC certainly increases at the same rate as before, but the
>> raw TSC restarted counting from 0 and systemtime has increased slower
>> than the guest kvmclock.
>
> Wait, are we talking about the host's NTP or the guest's NTP?
>
> If it's the host's, then wouldn't systemtime be reset after resume to
> the NTP corrected value?  If so, the guest wouldn't see time go
> backwards.
>
> If it's the guest's, then the guest's NTP correction is applied on top
> of kvmclock, and this shouldn't matter.
>
> I still feel like I'm missing something very basic here.
>

OK, I think I get it.

Marcelo, I thought that kvmclock was supposed to propagate the host's
correction to the guest.  If it did, indeed, propagate the correction
then, after resume, the host's new system_time would match the guest's
idea of it (after accounting for the guest's long nap), and I don't
think there would be a problem.

That being said, I can't find the code in the masterclock stuff that
would actually do this.

If, on the other hand, the host's NTP correction is not supposed to
propagate to the guest, then shouldn't KVM just update system_time on
resume to whatever the guest would think it had (which I think would
be equivalent to the host's CLOCK_MONOTONIC_RAW value, possibly
shifted by some per-guest constant offset).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-15 Thread Paolo Bonzini


On 14/12/2015 23:31, Andy Lutomirski wrote:
> > RAW TSC NTP corrected TSC
> > t0  10  10
> > t1  20  19.99
> > t2  30  29.98
> > t3  40  39.97
> > t4  50  49.96
> >
> > ...
> >
> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> > you can see what will happen.
>
> Sure, but why would you ever switch from one to the other?

The guest uses the raw TSC and systemtime = 0 until suspend.  After
resume, the TSC certainly increases at the same rate as before, but the
raw TSC restarted counting from 0 and systemtime has increased slower
than the guest kvmclock.

Paolo

> The only things that need to be monotonic are the output from
> vread_pvclock and the in-kernel equivalent, I think.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-14 Thread Paolo Bonzini


On 11/12/2015 22:57, Andy Lutomirski wrote:
> I'm still not seeing the issue.
> 
> The formula is:
> 
> (((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >>
> pvti->tsc_shift) + pvti->system_time
> 
> Obviously, if you reset pvti->tsc_timestamp to the current tsc value
> after suspend/resume, you would also need to update system_time.
> 
> I don't see what this has to do with suspend/resume or with whether
> the effective scale factor is greater than or less than one.  The only
> suspend/resume interaction I can see is that, if the host allows the
> guest-observed TSC value to jump (which is arguably a bug, what that's
> not important here), it needs to update pvti before resuming the
> guest.

Which is not an issue, since freezing obviously gets all CPUs out of
guest mode.

Marcelo, can you provide an example with made-up values for tsc and pvti?

> Can you clarify concretely what goes wrong here?
> 
> (I'm also at a bit of a loss as to why this needs both system_time and
> tsc_timestamp.  They're redundant in the sense that you could set
> tsc_timestamp to zero and subtract (tsc_timestamp * tsc_to_system_mul) >>
> tsc_shift to system_time without changing the result of the
> calculation.)

You would have to ensure that all elements of pvti are rounded correctly
whenever the base TSC is updated.  Doable, but it does seem simpler to
keep subtract-TSC and add-nanoseconds separate.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-14 Thread Marcelo Tosatti
On Fri, Dec 11, 2015 at 01:57:23PM -0800, Andy Lutomirski wrote:
> On Thu, Dec 10, 2015 at 1:32 PM, Marcelo Tosatti  wrote:
> > On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote:
> >> I'm trying to clean up kvmclock and I can't get it to work at all.  My
> >> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC.
> >>
> >> If I boot an SMP (2 vcpus) guest, tracing says:
> >>
> >>  qemu-system-x86-2517  [001] 102242.610654: kvm_update_master_clock:
> >> masterclock 0 hostclock tsc offsetmatched 0
> >>  qemu-system-x86-2521  [000] 102242.613742: kvm_track_tsc:
> >> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
> >>  qemu-system-x86-2522  [000] 102242.622959: kvm_track_tsc:
> >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
> >>  qemu-system-x86-2521  [000] 102242.645123: kvm_track_tsc:
> >> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
> >>  qemu-system-x86-2522  [000] 102242.647291: kvm_track_tsc:
> >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
> >>  qemu-system-x86-2521  [000] 102242.653369: kvm_track_tsc:
> >> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
> >>  qemu-system-x86-2522  [000] 102242.653429: kvm_track_tsc:
> >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
> >>  qemu-system-x86-2517  [001] 102242.653447: kvm_update_master_clock:
> >> masterclock 0 hostclock tsc offsetmatched 1
> >>  qemu-system-x86-2521  [000] 102242.653657: kvm_update_master_clock:
> >> masterclock 0 hostclock tsc offsetmatched 1
> >>  qemu-system-x86-2522  [002] 102242.664448: kvm_update_master_clock:
> >> masterclock 0 hostclock tsc offsetmatched 1
> >>
> >>
> >> If I boot a UP guest, tracing says:
> >>
> >>  qemu-system-x86-2567  [001] 102370.447484: kvm_update_master_clock:
> >> masterclock 0 hostclock tsc offsetmatched 1
> >>  qemu-system-x86-2571  [002] 102370.447688: kvm_update_master_clock:
> >> masterclock 0 hostclock tsc offsetmatched 1
> >>
> >> I suspect, but I haven't verified, that this is fallout from:
> >>
> >> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f
> >> Author: Marcelo Tosatti 
> >> Date:   Wed May 14 12:43:24 2014 -0300
> >>
> >> KVM: x86: disable master clock if TSC is reset during suspend
> >>
> >> Updating system_time from the kernel clock once master clock
> >> has been enabled can result in time backwards event, in case
> >> kernel clock frequency is lower than TSC frequency.
> >>
> >> Disable master clock in case it is necessary to update it
> >> from the resume path.
> >>
> >> Signed-off-by: Marcelo Tosatti 
> >> Signed-off-by: Paolo Bonzini 
> >>
> >>
> >> Can we please stop making kvmclock more complex?  It's a beast right
> >> now, and not in a good way.  It's far too tangled with the vclock
> >> machinery on both the host and guest sides, the pvclock stuff is not
> >> well thought out (even in principle in an ABI sense), and it's never
> >> been clear to my what problem exactly the kvmclock stuff is supposed
> >> to solve.
> >>
> >> I'm somewhat tempted to suggest that we delete kvmclock entirely and
> >> start over.  A correctly functioning KVM guest using TSC (i.e.
> >> ignoring kvmclock entirely)
> >> seems to work rather more reliably and
> >> considerably faster than a kvmclock guest.
> >>
> >> --Andy
> >>
> >> --
> >> Andy Lutomirski
> >> AMA Capital Management, LLC
> >
> > Andy,
> >
> > I am all for solving practical problems rather than pleasing aesthetic
> > pleasure.
> >
> >> Updating system_time from the kernel clock once master clock
> >> has been enabled can result in time backwards event, in case
> >> kernel clock frequency is lower than TSC frequency.
> >>
> >> Disable master clock in case it is necessary to update it
> >> from the resume path.
> >
> >> once master clock
> >> has been enabled can result in time backwards event, in case
> >> kernel clock frequency is lower than TSC frequency.
> >
> > guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads.
> >
> > If the effective frequency of the kernel clock is lower (for example
> > due to NTP correcting the TSC frequency of the system), and you resume
> > and update the system, the following happens:
> >
> > guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc 
> > reads=LARGE VALUE.

guest reads clock to memory at location A = scaled tsc read.

(note TSC is counting at frequency higher than advertised by
processor, thats why NTP has to "slow down" the kernel clock 
which is maintained by successive reads of the TSC).

> > suspend/resume event.
> > guest visible clock = tsc_timestamp (updated at time N) + scaled tsc 
> > reads=0.

Now the guest visible clock contains a tsc_timestamp that has been 
corrected by NTP, over say 5 days. So the tiny NTP correction has
been added up to something 

Re: kvmclock doesn't work, help?

2015-12-14 Thread Marcelo Tosatti
On Mon, Dec 14, 2015 at 10:07:21AM -0800, Andy Lutomirski wrote:
> On Fri, Dec 11, 2015 at 3:48 PM, Marcelo Tosatti  wrote:
> > On Fri, Dec 11, 2015 at 01:57:23PM -0800, Andy Lutomirski wrote:
> >> On Thu, Dec 10, 2015 at 1:32 PM, Marcelo Tosatti  
> >> wrote:
> >> > On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote:
> >> >> I'm trying to clean up kvmclock and I can't get it to work at all.  My
> >> >> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC.
> >> >>
> >> >> If I boot an SMP (2 vcpus) guest, tracing says:
> >> >>
> >> >>  qemu-system-x86-2517  [001] 102242.610654: kvm_update_master_clock:
> >> >> masterclock 0 hostclock tsc offsetmatched 0
> >> >>  qemu-system-x86-2521  [000] 102242.613742: kvm_track_tsc:
> >> >> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
> >> >>  qemu-system-x86-2522  [000] 102242.622959: kvm_track_tsc:
> >> >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
> >> >>  qemu-system-x86-2521  [000] 102242.645123: kvm_track_tsc:
> >> >> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
> >> >>  qemu-system-x86-2522  [000] 102242.647291: kvm_track_tsc:
> >> >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
> >> >>  qemu-system-x86-2521  [000] 102242.653369: kvm_track_tsc:
> >> >> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
> >> >>  qemu-system-x86-2522  [000] 102242.653429: kvm_track_tsc:
> >> >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
> >> >>  qemu-system-x86-2517  [001] 102242.653447: kvm_update_master_clock:
> >> >> masterclock 0 hostclock tsc offsetmatched 1
> >> >>  qemu-system-x86-2521  [000] 102242.653657: kvm_update_master_clock:
> >> >> masterclock 0 hostclock tsc offsetmatched 1
> >> >>  qemu-system-x86-2522  [002] 102242.664448: kvm_update_master_clock:
> >> >> masterclock 0 hostclock tsc offsetmatched 1
> >> >>
> >> >>
> >> >> If I boot a UP guest, tracing says:
> >> >>
> >> >>  qemu-system-x86-2567  [001] 102370.447484: kvm_update_master_clock:
> >> >> masterclock 0 hostclock tsc offsetmatched 1
> >> >>  qemu-system-x86-2571  [002] 102370.447688: kvm_update_master_clock:
> >> >> masterclock 0 hostclock tsc offsetmatched 1
> >> >>
> >> >> I suspect, but I haven't verified, that this is fallout from:
> >> >>
> >> >> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f
> >> >> Author: Marcelo Tosatti 
> >> >> Date:   Wed May 14 12:43:24 2014 -0300
> >> >>
> >> >> KVM: x86: disable master clock if TSC is reset during suspend
> >> >>
> >> >> Updating system_time from the kernel clock once master clock
> >> >> has been enabled can result in time backwards event, in case
> >> >> kernel clock frequency is lower than TSC frequency.
> >> >>
> >> >> Disable master clock in case it is necessary to update it
> >> >> from the resume path.
> >> >>
> >> >> Signed-off-by: Marcelo Tosatti 
> >> >> Signed-off-by: Paolo Bonzini 
> >> >>
> >> >>
> >> >> Can we please stop making kvmclock more complex?  It's a beast right
> >> >> now, and not in a good way.  It's far too tangled with the vclock
> >> >> machinery on both the host and guest sides, the pvclock stuff is not
> >> >> well thought out (even in principle in an ABI sense), and it's never
> >> >> been clear to my what problem exactly the kvmclock stuff is supposed
> >> >> to solve.
> >> >>
> >> >> I'm somewhat tempted to suggest that we delete kvmclock entirely and
> >> >> start over.  A correctly functioning KVM guest using TSC (i.e.
> >> >> ignoring kvmclock entirely)
> >> >> seems to work rather more reliably and
> >> >> considerably faster than a kvmclock guest.
> >> >>
> >> >> --Andy
> >> >>
> >> >> --
> >> >> Andy Lutomirski
> >> >> AMA Capital Management, LLC
> >> >
> >> > Andy,
> >> >
> >> > I am all for solving practical problems rather than pleasing aesthetic
> >> > pleasure.
> >> >
> >> >> Updating system_time from the kernel clock once master clock
> >> >> has been enabled can result in time backwards event, in case
> >> >> kernel clock frequency is lower than TSC frequency.
> >> >>
> >> >> Disable master clock in case it is necessary to update it
> >> >> from the resume path.
> >> >
> >> >> once master clock
> >> >> has been enabled can result in time backwards event, in case
> >> >> kernel clock frequency is lower than TSC frequency.
> >> >
> >> > guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc 
> >> > reads.
> >> >
> >> > If the effective frequency of the kernel clock is lower (for example
> >> > due to NTP correcting the TSC frequency of the system), and you resume
> >> > and update the system, the following happens:
> >> >
> >> > guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc 
> >> > reads=LARGE VALUE.
> >
> > guest reads clock to memory at location A = scaled tsc 

Re: kvmclock doesn't work, help?

2015-12-14 Thread Andy Lutomirski
On Fri, Dec 11, 2015 at 3:48 PM, Marcelo Tosatti  wrote:
> On Fri, Dec 11, 2015 at 01:57:23PM -0800, Andy Lutomirski wrote:
>> On Thu, Dec 10, 2015 at 1:32 PM, Marcelo Tosatti  wrote:
>> > On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote:
>> >> I'm trying to clean up kvmclock and I can't get it to work at all.  My
>> >> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC.
>> >>
>> >> If I boot an SMP (2 vcpus) guest, tracing says:
>> >>
>> >>  qemu-system-x86-2517  [001] 102242.610654: kvm_update_master_clock:
>> >> masterclock 0 hostclock tsc offsetmatched 0
>> >>  qemu-system-x86-2521  [000] 102242.613742: kvm_track_tsc:
>> >> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
>> >>  qemu-system-x86-2522  [000] 102242.622959: kvm_track_tsc:
>> >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>> >>  qemu-system-x86-2521  [000] 102242.645123: kvm_track_tsc:
>> >> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>> >>  qemu-system-x86-2522  [000] 102242.647291: kvm_track_tsc:
>> >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>> >>  qemu-system-x86-2521  [000] 102242.653369: kvm_track_tsc:
>> >> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>> >>  qemu-system-x86-2522  [000] 102242.653429: kvm_track_tsc:
>> >> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>> >>  qemu-system-x86-2517  [001] 102242.653447: kvm_update_master_clock:
>> >> masterclock 0 hostclock tsc offsetmatched 1
>> >>  qemu-system-x86-2521  [000] 102242.653657: kvm_update_master_clock:
>> >> masterclock 0 hostclock tsc offsetmatched 1
>> >>  qemu-system-x86-2522  [002] 102242.664448: kvm_update_master_clock:
>> >> masterclock 0 hostclock tsc offsetmatched 1
>> >>
>> >>
>> >> If I boot a UP guest, tracing says:
>> >>
>> >>  qemu-system-x86-2567  [001] 102370.447484: kvm_update_master_clock:
>> >> masterclock 0 hostclock tsc offsetmatched 1
>> >>  qemu-system-x86-2571  [002] 102370.447688: kvm_update_master_clock:
>> >> masterclock 0 hostclock tsc offsetmatched 1
>> >>
>> >> I suspect, but I haven't verified, that this is fallout from:
>> >>
>> >> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f
>> >> Author: Marcelo Tosatti 
>> >> Date:   Wed May 14 12:43:24 2014 -0300
>> >>
>> >> KVM: x86: disable master clock if TSC is reset during suspend
>> >>
>> >> Updating system_time from the kernel clock once master clock
>> >> has been enabled can result in time backwards event, in case
>> >> kernel clock frequency is lower than TSC frequency.
>> >>
>> >> Disable master clock in case it is necessary to update it
>> >> from the resume path.
>> >>
>> >> Signed-off-by: Marcelo Tosatti 
>> >> Signed-off-by: Paolo Bonzini 
>> >>
>> >>
>> >> Can we please stop making kvmclock more complex?  It's a beast right
>> >> now, and not in a good way.  It's far too tangled with the vclock
>> >> machinery on both the host and guest sides, the pvclock stuff is not
>> >> well thought out (even in principle in an ABI sense), and it's never
>> >> been clear to my what problem exactly the kvmclock stuff is supposed
>> >> to solve.
>> >>
>> >> I'm somewhat tempted to suggest that we delete kvmclock entirely and
>> >> start over.  A correctly functioning KVM guest using TSC (i.e.
>> >> ignoring kvmclock entirely)
>> >> seems to work rather more reliably and
>> >> considerably faster than a kvmclock guest.
>> >>
>> >> --Andy
>> >>
>> >> --
>> >> Andy Lutomirski
>> >> AMA Capital Management, LLC
>> >
>> > Andy,
>> >
>> > I am all for solving practical problems rather than pleasing aesthetic
>> > pleasure.
>> >
>> >> Updating system_time from the kernel clock once master clock
>> >> has been enabled can result in time backwards event, in case
>> >> kernel clock frequency is lower than TSC frequency.
>> >>
>> >> Disable master clock in case it is necessary to update it
>> >> from the resume path.
>> >
>> >> once master clock
>> >> has been enabled can result in time backwards event, in case
>> >> kernel clock frequency is lower than TSC frequency.
>> >
>> > guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads.
>> >
>> > If the effective frequency of the kernel clock is lower (for example
>> > due to NTP correcting the TSC frequency of the system), and you resume
>> > and update the system, the following happens:
>> >
>> > guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc 
>> > reads=LARGE VALUE.
>
> guest reads clock to memory at location A = scaled tsc read.
>
> (note TSC is counting at frequency higher than advertised by
> processor, thats why NTP has to "slow down" the kernel clock
> which is maintained by successive reads of the TSC).
>
>> > suspend/resume event.
>> > guest visible clock = tsc_timestamp (updated at time N) + 

Re: kvmclock doesn't work, help?

2015-12-14 Thread Marcelo Tosatti
On Mon, Dec 14, 2015 at 02:44:15PM +0100, Paolo Bonzini wrote:
> 
> 
> On 11/12/2015 22:57, Andy Lutomirski wrote:
> > I'm still not seeing the issue.
> > 
> > The formula is:
> > 
> > (((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >>
> > pvti->tsc_shift) + pvti->system_time
> > 
> > Obviously, if you reset pvti->tsc_timestamp to the current tsc value
> > after suspend/resume, you would also need to update system_time.
> > 
> > I don't see what this has to do with suspend/resume or with whether
> > the effective scale factor is greater than or less than one.  The only
> > suspend/resume interaction I can see is that, if the host allows the
> > guest-observed TSC value to jump (which is arguably a bug, what that's
> > not important here), it needs to update pvti before resuming the
> > guest.
> 
> Which is not an issue, since freezing obviously gets all CPUs out of
> guest mode.
> 
> Marcelo, can you provide an example with made-up values for tsc and pvti?

I meant "systemtime" at ^.

guest visible clock = systemtime (updated at time 0, guest initialization) + 
scaled tsc reads=LARGE VALUE.
  ^^
guest reads clock to memory at location A = scaled tsc read.
-> suspend resume event 
guest visible clock = systemtime (updated at time AFTER SUSPEND) + scaled tsc 
reads=0.
guest reads clock to memory at location B.

So before the suspend/resume event, the clock is the RAW TSC values
(scaled by kvmclock, but the frequency of the RAW TSC). 

After suspend/resume event, the clock is updated from the host
via get_kernel_ns(), which reads the corrected NTP frequency TSC.

So you switch the timebase, from a clock running at a given frequency,
to a clock running at another frequency (effective frequency).

Example:

RAW TSC NTP corrected TSC
t0  10  10
t1  20  19.99
t2  30  29.98
t3  40  39.97
t4  50  49.96

...

if you suddenly switch from RAW TSC to NTP corrected TSC, 
you can see what will happen.

Does that make sense?

> > suspend/resume event.
> > guest visible clock = tsc_timestamp (updated at time N) + scaled tsc
> > reads=0.


> 
> > Can you clarify concretely what goes wrong here?
> > 
> > (I'm also at a bit of a loss as to why this needs both system_time and
> > tsc_timestamp.  They're redundant in the sense that you could set
> > tsc_timestamp to zero and subtract (tsc_timestamp * tsc_to_system_mul) >>
> > tsc_shift to system_time without changing the result of the
> > calculation.)
> 
> You would have to ensure that all elements of pvti are rounded correctly
> whenever the base TSC is updated.  Doable, but it does seem simpler to
> keep subtract-TSC and add-nanoseconds separate.
> 
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-14 Thread Andy Lutomirski
On Mon, Dec 14, 2015 at 2:00 PM, Marcelo Tosatti  wrote:
> On Mon, Dec 14, 2015 at 02:44:15PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 11/12/2015 22:57, Andy Lutomirski wrote:
>> > I'm still not seeing the issue.
>> >
>> > The formula is:
>> >
>> > (((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >>
>> > pvti->tsc_shift) + pvti->system_time
>> >
>> > Obviously, if you reset pvti->tsc_timestamp to the current tsc value
>> > after suspend/resume, you would also need to update system_time.
>> >
>> > I don't see what this has to do with suspend/resume or with whether
>> > the effective scale factor is greater than or less than one.  The only
>> > suspend/resume interaction I can see is that, if the host allows the
>> > guest-observed TSC value to jump (which is arguably a bug, what that's
>> > not important here), it needs to update pvti before resuming the
>> > guest.
>>
>> Which is not an issue, since freezing obviously gets all CPUs out of
>> guest mode.
>>
>> Marcelo, can you provide an example with made-up values for tsc and pvti?
>
> I meant "systemtime" at ^.
>
> guest visible clock = systemtime (updated at time 0, guest initialization) + 
> scaled tsc reads=LARGE VALUE.
>   ^^
> guest reads clock to memory at location A = scaled tsc read.
> -> suspend resume event
> guest visible clock = systemtime (updated at time AFTER SUSPEND) + scaled tsc 
> reads=0.
> guest reads clock to memory at location B.
>
> So before the suspend/resume event, the clock is the RAW TSC values
> (scaled by kvmclock, but the frequency of the RAW TSC).
>
> After suspend/resume event, the clock is updated from the host
> via get_kernel_ns(), which reads the corrected NTP frequency TSC.
>
> So you switch the timebase, from a clock running at a given frequency,
> to a clock running at another frequency (effective frequency).
>
> Example:
>
> RAW TSC NTP corrected TSC
> t0  10  10
> t1  20  19.99
> t2  30  29.98
> t3  40  39.97
> t4  50  49.96
>
> ...
>
> if you suddenly switch from RAW TSC to NTP corrected TSC,
> you can see what will happen.

Sure, but why would you ever switch from one to the other?  I'm still
not seeing the scenario under which this discontinuity is visible to
anything other than the kvmclock code itself.

The only things that need to be monotonic are the output from
vread_pvclock and the in-kernel equivalent, I think.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-11 Thread Marcelo Tosatti
On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote:
> I'm trying to clean up kvmclock and I can't get it to work at all.  My
> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC.
> 
> If I boot an SMP (2 vcpus) guest, tracing says:
> 
>  qemu-system-x86-2517  [001] 102242.610654: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
>  qemu-system-x86-2521  [000] 102242.613742: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.622959: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2521  [000] 102242.645123: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.647291: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2521  [000] 102242.653369: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.653429: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2517  [001] 102242.653447: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2521  [000] 102242.653657: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2522  [002] 102242.664448: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
> 
> 
> If I boot a UP guest, tracing says:
> 
>  qemu-system-x86-2567  [001] 102370.447484: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2571  [002] 102370.447688: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
> 
> I suspect, but I haven't verified, that this is fallout from:
> 
> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f
> Author: Marcelo Tosatti 
> Date:   Wed May 14 12:43:24 2014 -0300
> 
> KVM: x86: disable master clock if TSC is reset during suspend
> 
> Updating system_time from the kernel clock once master clock
> has been enabled can result in time backwards event, in case
> kernel clock frequency is lower than TSC frequency.
> 
> Disable master clock in case it is necessary to update it
> from the resume path.
> 
> Signed-off-by: Marcelo Tosatti 
> Signed-off-by: Paolo Bonzini 
> 
> 
> Can we please stop making kvmclock more complex?  It's a beast right
> now, and not in a good way.  It's far too tangled with the vclock
> machinery on both the host and guest sides, the pvclock stuff is not
> well thought out (even in principle in an ABI sense), and it's never
> been clear to my what problem exactly the kvmclock stuff is supposed
> to solve.
>
> I'm somewhat tempted to suggest that we delete kvmclock entirely and
> start over.  A correctly functioning KVM guest using TSC (i.e.
> ignoring kvmclock entirely) 
> seems to work rather more reliably and
> considerably faster than a kvmclock guest.
> 
> --Andy
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC

Andy,

I am all for solving practical problems rather than pleasing aesthetic
pleasure. 

> Updating system_time from the kernel clock once master clock
> has been enabled can result in time backwards event, in case
> kernel clock frequency is lower than TSC frequency.
> 
> Disable master clock in case it is necessary to update it
> from the resume path.

> once master clock
> has been enabled can result in time backwards event, in case
> kernel clock frequency is lower than TSC frequency.

guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads.

If the effective frequency of the kernel clock is lower (for example
due to NTP correcting the TSC frequency of the system), and you resume 
and update the system, the following happens:

guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc 
reads=LARGE VALUE.
suspend/resume event.
guest visible clock = tsc_timestamp (updated at time N) + scaled tsc reads=0.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-11 Thread Marcelo Tosatti
On Wed, Dec 09, 2015 at 02:27:36PM -0800, Andy Lutomirski wrote:
> On Wed, Dec 9, 2015 at 2:12 PM, Paolo Bonzini  wrote:
> >
> >
> > On 09/12/2015 22:49, Andy Lutomirski wrote:
> >> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini  wrote:
> >>>
> >>>
> >>> On 09/12/2015 22:10, Andy Lutomirski wrote:
>  Can we please stop making kvmclock more complex?  It's a beast right
>  now, and not in a good way.  It's far too tangled with the vclock
>  machinery on both the host and guest sides, the pvclock stuff is not
>  well thought out (even in principle in an ABI sense), and it's never
>  been clear to my what problem exactly the kvmclock stuff is supposed
>  to solve.
> >>>
> >>> It's supposed to solve the problem that:
> >>>
> >>> - not all hosts have a working TSC
> >>
> >> Fine, but we don't need any vdso integration for that.
> >
> > Well, you still want a fast time source.  That was a given. :)
> 
> If the host can't figure out how to give *itself* a fast time source,
> I'd be surprised if KVM can manage to give the guest a fast, reliable
> time source.
> 
> >
> >>> - even if they all do, virtual machines can be migrated (or
> >>> saved/restored) to a host with a different TSC frequency
> >>>
> >>> - any MMIO- or PIO-based mechanism to access the current time is orders
> >>> of magnitude slower than the TSC and less precise too.
> >>
> >> Yup.  But TSC by itself gets that benefit, too.
> >
> > Yes, the problem is if you want to solve all three of them.  The first
> > two are solved by the ACPI PM timer with a decent resolution (70
> > ns---much faster anyway than an I/O port access).  The third is solved
> > by TSC.  To solve all three, you need kvmclock.
> 
> Still confused.  Is kvmclock really used in cases where even the host
> can't pull of working TSC?
> 
> >
>  I'm somewhat tempted to suggest that we delete kvmclock entirely and
>  start over.  A correctly functioning KVM guest using TSC (i.e.
>  ignoring kvmclock entirely) seems to work rather more reliably and
>  considerably faster than a kvmclock guest.
> >>>
> >>> If all your hosts have a working TSC and you don't do migration or
> >>> save/restore, that's a valid configuration.  It's not a good default,
> >>> however.
> >>
> >> Er?
> >>
> >> kvmclock is still really quite slow and buggy.
> >
> > Unless it takes 3-4000 clock cycles for a gettimeofday, which it
> > shouldn't even with vdso disabled, it's definitely not slower than PIO.
> >
> >> And the patch I identified is definitely a problem here:
> >>
> >> [  136.131241] KVM: disabling fast timing permanently due to inability
> >> to recover from suspend
> >>
> >> I got that on the host with this whitespace-damaged patch:
> >>
> >> if (backwards_tsc) {
> >> u64 delta_cyc = max_tsc - local_tsc;
> >> +   if (!backwards_tsc_observed)
> >> +   pr_warn("KVM: disabling fast timing
> >> permanently due to inability to recover from suspend\n");
> >>
> >> when I suspended and resumed.
> >>
> >> Can anyone explain what problem
> >> 16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve?  On
> >> brief inspection, it just seems to be incorrect.  Shouldn't KVM's
> >> normal TSC logic handle that case right?  After all, all vcpus should
> >> be paused when we resume from suspend.  At worst, we should just need
> >> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus.  (Actually,
> >> shouldn't we do that regardless of which way the TSC jumped on
> >> suspend/resume?  After all, the jTSC-to-wall-clock offset is quite
> >> likely to change except on the very small handful of CPUs (if any)
> >> that keep the TSC running in S3 and hibernate.
> >
> > I don't recall the details of that patch, so Marcelo will have to answer
> > this, or Alex too since he chimed in the original thread.  At least it
> > should be made conditional on the existence of a VM at suspend time (and
> > the master clock stuff should be made per VM, as I suggested at
> > https://www.mail-archive.com/kvm@vger.kernel.org/msg102316.html).
> >
> > It would indeed be great if the master clock could be dropped.  But I'm
> > definitely missing some of the subtle details. :(
> 
> Me, too.
> 
> Anyway, see the attached untested patch.  Marcelo?
> 
> --Andy

Read the last email, about the problem.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-11 Thread Marcelo Tosatti
On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote:
> I'm trying to clean up kvmclock and I can't get it to work at all.  My
> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC.
> 
> If I boot an SMP (2 vcpus) guest, tracing says:
> 
>  qemu-system-x86-2517  [001] 102242.610654: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
>  qemu-system-x86-2521  [000] 102242.613742: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.622959: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2521  [000] 102242.645123: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.647291: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2521  [000] 102242.653369: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2522  [000] 102242.653429: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-2517  [001] 102242.653447: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2521  [000] 102242.653657: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2522  [002] 102242.664448: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
> 
> 
> If I boot a UP guest, tracing says:
> 
>  qemu-system-x86-2567  [001] 102370.447484: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
>  qemu-system-x86-2571  [002] 102370.447688: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 1
> 
> I suspect, but I haven't verified, that this is fallout from:
> 
> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f
> Author: Marcelo Tosatti 
> Date:   Wed May 14 12:43:24 2014 -0300
> 
> KVM: x86: disable master clock if TSC is reset during suspend
> 
> Updating system_time from the kernel clock once master clock
> has been enabled can result in time backwards event, in case
> kernel clock frequency is lower than TSC frequency.
> 
> Disable master clock in case it is necessary to update it
> from the resume path.
> 
> Signed-off-by: Marcelo Tosatti 
> Signed-off-by: Paolo Bonzini 
> 
> 
> Can we please stop making kvmclock more complex?  It's a beast right
> now, and not in a good way.  It's far too tangled with the vclock
> machinery on both the host and guest sides, the pvclock stuff is not
> well thought out (even in principle in an ABI sense), and it's never
> been clear to my what problem exactly the kvmclock stuff is supposed
> to solve.
> 
> 
> I'm somewhat tempted to suggest that we delete kvmclock entirely and
> start over.  A correctly functioning KVM guest using TSC (i.e.
> ignoring kvmclock entirely) seems to work rather more reliably and
> considerably faster than a kvmclock guest.
> 
> --Andy

Users can do that, if they want. "clocksource=tsc" kernel option.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-11 Thread Andy Lutomirski
On Thu, Dec 10, 2015 at 1:32 PM, Marcelo Tosatti  wrote:
> On Wed, Dec 09, 2015 at 01:10:59PM -0800, Andy Lutomirski wrote:
>> I'm trying to clean up kvmclock and I can't get it to work at all.  My
>> host is 4.4.0-rc3-ish on a Skylake laptop that has a working TSC.
>>
>> If I boot an SMP (2 vcpus) guest, tracing says:
>>
>>  qemu-system-x86-2517  [001] 102242.610654: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 0
>>  qemu-system-x86-2521  [000] 102242.613742: kvm_track_tsc:
>> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
>>  qemu-system-x86-2522  [000] 102242.622959: kvm_track_tsc:
>> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>>  qemu-system-x86-2521  [000] 102242.645123: kvm_track_tsc:
>> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>>  qemu-system-x86-2522  [000] 102242.647291: kvm_track_tsc:
>> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>>  qemu-system-x86-2521  [000] 102242.653369: kvm_track_tsc:
>> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>>  qemu-system-x86-2522  [000] 102242.653429: kvm_track_tsc:
>> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>>  qemu-system-x86-2517  [001] 102242.653447: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 1
>>  qemu-system-x86-2521  [000] 102242.653657: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 1
>>  qemu-system-x86-2522  [002] 102242.664448: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 1
>>
>>
>> If I boot a UP guest, tracing says:
>>
>>  qemu-system-x86-2567  [001] 102370.447484: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 1
>>  qemu-system-x86-2571  [002] 102370.447688: kvm_update_master_clock:
>> masterclock 0 hostclock tsc offsetmatched 1
>>
>> I suspect, but I haven't verified, that this is fallout from:
>>
>> commit 16a9602158861687c78b6de6dc6a79e6e8a9136f
>> Author: Marcelo Tosatti 
>> Date:   Wed May 14 12:43:24 2014 -0300
>>
>> KVM: x86: disable master clock if TSC is reset during suspend
>>
>> Updating system_time from the kernel clock once master clock
>> has been enabled can result in time backwards event, in case
>> kernel clock frequency is lower than TSC frequency.
>>
>> Disable master clock in case it is necessary to update it
>> from the resume path.
>>
>> Signed-off-by: Marcelo Tosatti 
>> Signed-off-by: Paolo Bonzini 
>>
>>
>> Can we please stop making kvmclock more complex?  It's a beast right
>> now, and not in a good way.  It's far too tangled with the vclock
>> machinery on both the host and guest sides, the pvclock stuff is not
>> well thought out (even in principle in an ABI sense), and it's never
>> been clear to my what problem exactly the kvmclock stuff is supposed
>> to solve.
>>
>> I'm somewhat tempted to suggest that we delete kvmclock entirely and
>> start over.  A correctly functioning KVM guest using TSC (i.e.
>> ignoring kvmclock entirely)
>> seems to work rather more reliably and
>> considerably faster than a kvmclock guest.
>>
>> --Andy
>>
>> --
>> Andy Lutomirski
>> AMA Capital Management, LLC
>
> Andy,
>
> I am all for solving practical problems rather than pleasing aesthetic
> pleasure.
>
>> Updating system_time from the kernel clock once master clock
>> has been enabled can result in time backwards event, in case
>> kernel clock frequency is lower than TSC frequency.
>>
>> Disable master clock in case it is necessary to update it
>> from the resume path.
>
>> once master clock
>> has been enabled can result in time backwards event, in case
>> kernel clock frequency is lower than TSC frequency.
>
> guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc reads.
>
> If the effective frequency of the kernel clock is lower (for example
> due to NTP correcting the TSC frequency of the system), and you resume
> and update the system, the following happens:
>
> guest visible clock = tsc_timestamp (updated at time 0) + scaled tsc 
> reads=LARGE VALUE.
> suspend/resume event.
> guest visible clock = tsc_timestamp (updated at time N) + scaled tsc reads=0.
>

I'm still not seeing the issue.

The formula is:

(((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >>
pvti->tsc_shift) + pvti->system_time

Obviously, if you reset pvti->tsc_timestamp to the current tsc value
after suspend/resume, you would also need to update system_time.

I don't see what this has to do with suspend/resume or with whether
the effective scale factor is greater than or less than one.  The only
suspend/resume interaction I can see is that, if the host allows the
guest-observed TSC value to jump (which is arguably a bug, what that's
not important here), it needs to update pvti before resuming the
guest.

Can you clarify concretely what goes wrong 

Re: kvmclock doesn't work, help?

2015-12-09 Thread Andy Lutomirski
On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini  wrote:
>
>
> On 09/12/2015 22:10, Andy Lutomirski wrote:
>> Can we please stop making kvmclock more complex?  It's a beast right
>> now, and not in a good way.  It's far too tangled with the vclock
>> machinery on both the host and guest sides, the pvclock stuff is not
>> well thought out (even in principle in an ABI sense), and it's never
>> been clear to my what problem exactly the kvmclock stuff is supposed
>> to solve.
>
> It's supposed to solve the problem that:
>
> - not all hosts have a working TSC

Fine, but we don't need any vdso integration for that.

>
> - even if they all do, virtual machines can be migrated (or
> saved/restored) to a host with a different TSC frequency

OK, I buy that.  So we want to export a linear function that the guest
applies to the TSC so the guest can apply it.

I suppose we also want ntp frequency corrections on the host to
propagate to the guest.

>
> - any MMIO- or PIO-based mechanism to access the current time is orders
> of magnitude slower than the TSC and less precise too.

Yup.  But TSC by itself gets that benefit, too.

>
>> I'm somewhat tempted to suggest that we delete kvmclock entirely and
>> start over.  A correctly functioning KVM guest using TSC (i.e.
>> ignoring kvmclock entirely) seems to work rather more reliably and
>> considerably faster than a kvmclock guest.
>
> If all your hosts have a working TSC and you don't do migration or
> save/restore, that's a valid configuration.  It's not a good default,
> however.

Er?

kvmclock is still really quite slow and buggy.  And the patch I
identified is definitely a problem here:

[  136.131241] KVM: disabling fast timing permanently due to inability
to recover from suspend

I got that on the host with this whitespace-damaged patch:

if (backwards_tsc) {
u64 delta_cyc = max_tsc - local_tsc;
+   if (!backwards_tsc_observed)
+   pr_warn("KVM: disabling fast timing
permanently due to inability to recover from suspend\n");

when I suspended and resumed.

Can anyone explain what problem
16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve?  On
brief inspection, it just seems to be incorrect.  Shouldn't KVM's
normal TSC logic handle that case right?  After all, all vcpus should
be paused when we resume from suspend.  At worst, we should just need
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus.  (Actually,
shouldn't we do that regardless of which way the TSC jumped on
suspend/resume?  After all, the jTSC-to-wall-clock offset is quite
likely to change except on the very small handful of CPUs (if any)
that keep the TSC running in S3 and hibernate.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-09 Thread Andy Lutomirski
On Wed, Dec 9, 2015 at 2:12 PM, Paolo Bonzini  wrote:
>
>
> On 09/12/2015 22:49, Andy Lutomirski wrote:
>> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini  wrote:
>>>
>>>
>>> On 09/12/2015 22:10, Andy Lutomirski wrote:
 Can we please stop making kvmclock more complex?  It's a beast right
 now, and not in a good way.  It's far too tangled with the vclock
 machinery on both the host and guest sides, the pvclock stuff is not
 well thought out (even in principle in an ABI sense), and it's never
 been clear to my what problem exactly the kvmclock stuff is supposed
 to solve.
>>>
>>> It's supposed to solve the problem that:
>>>
>>> - not all hosts have a working TSC
>>
>> Fine, but we don't need any vdso integration for that.
>
> Well, you still want a fast time source.  That was a given. :)

If the host can't figure out how to give *itself* a fast time source,
I'd be surprised if KVM can manage to give the guest a fast, reliable
time source.

>
>>> - even if they all do, virtual machines can be migrated (or
>>> saved/restored) to a host with a different TSC frequency
>>>
>>> - any MMIO- or PIO-based mechanism to access the current time is orders
>>> of magnitude slower than the TSC and less precise too.
>>
>> Yup.  But TSC by itself gets that benefit, too.
>
> Yes, the problem is if you want to solve all three of them.  The first
> two are solved by the ACPI PM timer with a decent resolution (70
> ns---much faster anyway than an I/O port access).  The third is solved
> by TSC.  To solve all three, you need kvmclock.

Still confused.  Is kvmclock really used in cases where even the host
can't pull of working TSC?

>
 I'm somewhat tempted to suggest that we delete kvmclock entirely and
 start over.  A correctly functioning KVM guest using TSC (i.e.
 ignoring kvmclock entirely) seems to work rather more reliably and
 considerably faster than a kvmclock guest.
>>>
>>> If all your hosts have a working TSC and you don't do migration or
>>> save/restore, that's a valid configuration.  It's not a good default,
>>> however.
>>
>> Er?
>>
>> kvmclock is still really quite slow and buggy.
>
> Unless it takes 3-4000 clock cycles for a gettimeofday, which it
> shouldn't even with vdso disabled, it's definitely not slower than PIO.
>
>> And the patch I identified is definitely a problem here:
>>
>> [  136.131241] KVM: disabling fast timing permanently due to inability
>> to recover from suspend
>>
>> I got that on the host with this whitespace-damaged patch:
>>
>> if (backwards_tsc) {
>> u64 delta_cyc = max_tsc - local_tsc;
>> +   if (!backwards_tsc_observed)
>> +   pr_warn("KVM: disabling fast timing
>> permanently due to inability to recover from suspend\n");
>>
>> when I suspended and resumed.
>>
>> Can anyone explain what problem
>> 16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve?  On
>> brief inspection, it just seems to be incorrect.  Shouldn't KVM's
>> normal TSC logic handle that case right?  After all, all vcpus should
>> be paused when we resume from suspend.  At worst, we should just need
>> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus.  (Actually,
>> shouldn't we do that regardless of which way the TSC jumped on
>> suspend/resume?  After all, the jTSC-to-wall-clock offset is quite
>> likely to change except on the very small handful of CPUs (if any)
>> that keep the TSC running in S3 and hibernate.
>
> I don't recall the details of that patch, so Marcelo will have to answer
> this, or Alex too since he chimed in the original thread.  At least it
> should be made conditional on the existence of a VM at suspend time (and
> the master clock stuff should be made per VM, as I suggested at
> https://www.mail-archive.com/kvm@vger.kernel.org/msg102316.html).
>
> It would indeed be great if the master clock could be dropped.  But I'm
> definitely missing some of the subtle details. :(

Me, too.

Anyway, see the attached untested patch.  Marcelo?

--Andy
From e4a5e834d3fb6fc2499966e1af42cb5bd59f4410 Mon Sep 17 00:00:00 2001
Message-Id: 
From: Andy Lutomirski 
Date: Wed, 9 Dec 2015 14:21:05 -0800
Subject: [PATCH] x86/kvm: On KVM re-enable (e.g. after suspect), update clocks

This gets rid of the "did TSC go backwards" logic and just updates
all clocks.  It should work better (no more disabling of fast
timing) and more reliably (all of the clocks are actually updated).

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kvm/x86.c | 75 +++---
 1 file changed, 3 insertions(+), 72 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eed32283d22c..c88f91f4b1a3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -123,8 +123,6 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
 unsigned 

Re: kvmclock doesn't work, help?

2015-12-09 Thread Paolo Bonzini


On 09/12/2015 23:27, Andy Lutomirski wrote:
> On Wed, Dec 9, 2015 at 2:12 PM, Paolo Bonzini  wrote:
>> On 09/12/2015 22:49, Andy Lutomirski wrote:
>>> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini  wrote:


 On 09/12/2015 22:10, Andy Lutomirski wrote:
> Can we please stop making kvmclock more complex?  It's a beast right
> now, and not in a good way.  It's far too tangled with the vclock
> machinery on both the host and guest sides, the pvclock stuff is not
> well thought out (even in principle in an ABI sense), and it's never
> been clear to my what problem exactly the kvmclock stuff is supposed
> to solve.

 It's supposed to solve the problem that:

 - not all hosts have a working TSC
>>>
>>> Fine, but we don't need any vdso integration for that.
>>
>> Well, you still want a fast time source.  That was a given. :)
> 
> If the host can't figure out how to give *itself* a fast time source,
> I'd be surprised if KVM can manage to give the guest a fast, reliable
> time source.

There's no vdso integration unless the host has a constant, nonstop
(fully "working") TSC.  That's the meaning of PVCLOCK_TSC_STABLE_BIT.

So, correction:  if you can pull it off, you still want a fast time
source.  Otherwise, you still want one that is as fast as possible,
especially on the kernel side.

 - even if they all do, virtual machines can be migrated (or
 saved/restored) to a host with a different TSC frequency

 - any MMIO- or PIO-based mechanism to access the current time is orders
 of magnitude slower than the TSC and less precise too.
>>
>> the problem is if you want to solve all three of them.  The first
>> two are solved by the ACPI PM timer with a decent resolution (70
>> ns---much faster anyway than an I/O port access).  The third is solved
>> by TSC.  To solve all three, you need kvmclock.
> 
> Still confused.  Is kvmclock really used in cases where even the host
> can't pull of working TSC?

You can certainly provide kvmclock even if you lack constant-rate or
nonstop TSC.  Those are only a requirement for vdso.

If the host has a constant-rate TSC, but the rate differs per physical
CPU (common on older NUMA machines), you can easily provide a working
kvmclock.  It cannot support vdso because you'll need to read the time
from a non-preemptable section, but it will work because KVM can update
the kvmclock parameters on VCPU migration, and it's still faster than
anything else.  (The purpose of the now-gone migration notifiers was to
support vdso even in this case).

If the host doesn't even have constant-rate TSC, you can still provide
kernel-only kvmclock reads through cpufreq notifiers.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-09 Thread Paolo Bonzini


On 09/12/2015 22:10, Andy Lutomirski wrote:
> Can we please stop making kvmclock more complex?  It's a beast right
> now, and not in a good way.  It's far too tangled with the vclock
> machinery on both the host and guest sides, the pvclock stuff is not
> well thought out (even in principle in an ABI sense), and it's never
> been clear to my what problem exactly the kvmclock stuff is supposed
> to solve.

It's supposed to solve the problem that:

- not all hosts have a working TSC

- even if they all do, virtual machines can be migrated (or
saved/restored) to a host with a different TSC frequency

- any MMIO- or PIO-based mechanism to access the current time is orders
of magnitude slower than the TSC and less precise too.

> I'm somewhat tempted to suggest that we delete kvmclock entirely and
> start over.  A correctly functioning KVM guest using TSC (i.e.
> ignoring kvmclock entirely) seems to work rather more reliably and
> considerably faster than a kvmclock guest.

If all your hosts have a working TSC and you don't do migration or
save/restore, that's a valid configuration.  It's not a good default,
however.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-09 Thread Andy Lutomirski
On Wed, Dec 9, 2015 at 2:27 PM, Andy Lutomirski  wrote:
> On Wed, Dec 9, 2015 at 2:12 PM, Paolo Bonzini  wrote:
>>
>>
>> On 09/12/2015 22:49, Andy Lutomirski wrote:
>>> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini  wrote:


 On 09/12/2015 22:10, Andy Lutomirski wrote:
> Can we please stop making kvmclock more complex?  It's a beast right
> now, and not in a good way.  It's far too tangled with the vclock
> machinery on both the host and guest sides, the pvclock stuff is not
> well thought out (even in principle in an ABI sense), and it's never
> been clear to my what problem exactly the kvmclock stuff is supposed
> to solve.

 It's supposed to solve the problem that:

 - not all hosts have a working TSC
>>>
>>> Fine, but we don't need any vdso integration for that.
>>
>> Well, you still want a fast time source.  That was a given. :)
>
> If the host can't figure out how to give *itself* a fast time source,
> I'd be surprised if KVM can manage to give the guest a fast, reliable
> time source.
>
>>
 - even if they all do, virtual machines can be migrated (or
 saved/restored) to a host with a different TSC frequency

 - any MMIO- or PIO-based mechanism to access the current time is orders
 of magnitude slower than the TSC and less precise too.
>>>
>>> Yup.  But TSC by itself gets that benefit, too.
>>
>> Yes, the problem is if you want to solve all three of them.  The first
>> two are solved by the ACPI PM timer with a decent resolution (70
>> ns---much faster anyway than an I/O port access).  The third is solved
>> by TSC.  To solve all three, you need kvmclock.
>
> Still confused.  Is kvmclock really used in cases where even the host
> can't pull of working TSC?
>
>>
> I'm somewhat tempted to suggest that we delete kvmclock entirely and
> start over.  A correctly functioning KVM guest using TSC (i.e.
> ignoring kvmclock entirely) seems to work rather more reliably and
> considerably faster than a kvmclock guest.

 If all your hosts have a working TSC and you don't do migration or
 save/restore, that's a valid configuration.  It's not a good default,
 however.
>>>
>>> Er?
>>>
>>> kvmclock is still really quite slow and buggy.
>>
>> Unless it takes 3-4000 clock cycles for a gettimeofday, which it
>> shouldn't even with vdso disabled, it's definitely not slower than PIO.
>>
>>> And the patch I identified is definitely a problem here:
>>>
>>> [  136.131241] KVM: disabling fast timing permanently due to inability
>>> to recover from suspend
>>>
>>> I got that on the host with this whitespace-damaged patch:
>>>
>>> if (backwards_tsc) {
>>> u64 delta_cyc = max_tsc - local_tsc;
>>> +   if (!backwards_tsc_observed)
>>> +   pr_warn("KVM: disabling fast timing
>>> permanently due to inability to recover from suspend\n");
>>>
>>> when I suspended and resumed.
>>>
>>> Can anyone explain what problem
>>> 16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve?  On
>>> brief inspection, it just seems to be incorrect.  Shouldn't KVM's
>>> normal TSC logic handle that case right?  After all, all vcpus should
>>> be paused when we resume from suspend.  At worst, we should just need
>>> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus.  (Actually,
>>> shouldn't we do that regardless of which way the TSC jumped on
>>> suspend/resume?  After all, the jTSC-to-wall-clock offset is quite
>>> likely to change except on the very small handful of CPUs (if any)
>>> that keep the TSC running in S3 and hibernate.
>>
>> I don't recall the details of that patch, so Marcelo will have to answer
>> this, or Alex too since he chimed in the original thread.  At least it
>> should be made conditional on the existence of a VM at suspend time (and
>> the master clock stuff should be made per VM, as I suggested at
>> https://www.mail-archive.com/kvm@vger.kernel.org/msg102316.html).
>>
>> It would indeed be great if the master clock could be dropped.  But I'm
>> definitely missing some of the subtle details. :(
>
> Me, too.
>
> Anyway, see the attached untested patch.  Marcelo?

That patch seems to work.  I have valid timing before and after host
suspend.  When I suspend and resume the host with a running guest, I
get:

[   26.504071] clocksource: timekeeping watchdog: Marking clocksource
'tsc' as unstable because the skew is too large:
[   26.505253] clocksource:   'kvm-clock' wd_now:
66744c542 wd_last: 564b09794 mask: 
[   26.506436] clocksource:   'tsc' cs_now:
fee310b133c8 cs_last: cf5d0b952 mask: 

in the guest, which is arguably correct.  KVM could be further
improved to update the tsc offset after suspend/resume to get rid of
that artifact.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a 

Re: kvmclock doesn't work, help?

2015-12-09 Thread Paolo Bonzini


On 09/12/2015 22:49, Andy Lutomirski wrote:
> On Wed, Dec 9, 2015 at 1:16 PM, Paolo Bonzini  wrote:
>>
>>
>> On 09/12/2015 22:10, Andy Lutomirski wrote:
>>> Can we please stop making kvmclock more complex?  It's a beast right
>>> now, and not in a good way.  It's far too tangled with the vclock
>>> machinery on both the host and guest sides, the pvclock stuff is not
>>> well thought out (even in principle in an ABI sense), and it's never
>>> been clear to my what problem exactly the kvmclock stuff is supposed
>>> to solve.
>>
>> It's supposed to solve the problem that:
>>
>> - not all hosts have a working TSC
> 
> Fine, but we don't need any vdso integration for that.

Well, you still want a fast time source.  That was a given. :)

>> - even if they all do, virtual machines can be migrated (or
>> saved/restored) to a host with a different TSC frequency
>> 
>> - any MMIO- or PIO-based mechanism to access the current time is orders
>> of magnitude slower than the TSC and less precise too.
> 
> Yup.  But TSC by itself gets that benefit, too.

Yes, the problem is if you want to solve all three of them.  The first
two are solved by the ACPI PM timer with a decent resolution (70
ns---much faster anyway than an I/O port access).  The third is solved
by TSC.  To solve all three, you need kvmclock.

>>> I'm somewhat tempted to suggest that we delete kvmclock entirely and
>>> start over.  A correctly functioning KVM guest using TSC (i.e.
>>> ignoring kvmclock entirely) seems to work rather more reliably and
>>> considerably faster than a kvmclock guest.
>>
>> If all your hosts have a working TSC and you don't do migration or
>> save/restore, that's a valid configuration.  It's not a good default,
>> however.
> 
> Er?
> 
> kvmclock is still really quite slow and buggy.

Unless it takes 3-4000 clock cycles for a gettimeofday, which it
shouldn't even with vdso disabled, it's definitely not slower than PIO.

> And the patch I identified is definitely a problem here:
> 
> [  136.131241] KVM: disabling fast timing permanently due to inability
> to recover from suspend
> 
> I got that on the host with this whitespace-damaged patch:
> 
> if (backwards_tsc) {
> u64 delta_cyc = max_tsc - local_tsc;
> +   if (!backwards_tsc_observed)
> +   pr_warn("KVM: disabling fast timing
> permanently due to inability to recover from suspend\n");
> 
> when I suspended and resumed.
> 
> Can anyone explain what problem
> 16a9602158861687c78b6de6dc6a79e6e8a9136f is supposed to solve?  On
> brief inspection, it just seems to be incorrect.  Shouldn't KVM's
> normal TSC logic handle that case right?  After all, all vcpus should
> be paused when we resume from suspend.  At worst, we should just need
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) on all vcpus.  (Actually,
> shouldn't we do that regardless of which way the TSC jumped on
> suspend/resume?  After all, the jTSC-to-wall-clock offset is quite
> likely to change except on the very small handful of CPUs (if any)
> that keep the TSC running in S3 and hibernate.

I don't recall the details of that patch, so Marcelo will have to answer
this, or Alex too since he chimed in the original thread.  At least it
should be made conditional on the existence of a VM at suspend time (and
the master clock stuff should be made per VM, as I suggested at
https://www.mail-archive.com/kvm@vger.kernel.org/msg102316.html).

It would indeed be great if the master clock could be dropped.  But I'm
definitely missing some of the subtle details. :(

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html