On 2010-08-02 22:26, Zachary Amsden wrote:
> On 08/02/2010 04:43 AM, Glauber Costa wrote:
>> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote:
>>
>>> On 07/31/2010 06:36 AM, Arjan Koers wrote:
>>>
>>>> On 2010-07-31 13:53, Arjan Koers wrote:
>>>>
>>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set.
>>>>>
>>>>>
>>>> The problem occurs when this message is printed:
>>>>
>>>> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock
>>>>
>>>> When I disable that printk, the kernel boots with
>>>> CONFIG_PRINTK_TIME=y
>>>>
>>>> --- a/arch/x86/kernel/kvmclock.c
>>>> +++ b/arch/x86/kernel/kvmclock.c
>>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt)
>>>> int low, high;
>>>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
>>>> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32);
>>>> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>>> - cpu, high, low, txt);
>>>> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
>>>> + cpu, high, low, txt);*/
>>>>
>>>> return native_write_msr_safe(msr_kvm_system_time, low, high);
>>>> }
>>>>
>>>> So the problem appears to be that the clock of the second CPU
>>>> is used too soon (or that clock setup should finish earlier).
>>>>
>>> That's almost hilarious. The printk from setting up the kvm clock
>>> is invoking the kvm clock before it is setup.
>>>
>>> There's no reason other printks couldn't do the same thing, however.
>>> I think it's safest to keep an initialized flag and check for it
>>> before attempting to return a meaningful value.
>>>
>> I was on vacations, just got back.
>>
>> I think it is safe to just patch our own use of it. Before that, all
>> other
>> printks will be handled by the main cpu anyway, since it'll be the
>> only one active
>> at the moment. The only possible offenders for this are us, and the
>> cpu initialization
>> code, which is already fragile in multiple ways anyway.
>>
>> A flag would only make things more complicated and dirty
>>
> Can we just do this?
Sorry, the patch doesn't help. See line 68 in my debug log:
65: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c0631e 375d47
13c5ce 15655813de60
66: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c0653b 375f64
13c68f 15655813df21
67: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c06746 37616f
13c74a 15655813dfdc
68: ffff880001511c00 1967ac192 15654c8d826a 63c6c 3bf58bf0ea18 3bf3f5762886
15695466a1e5 2acea0f4244f
69: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f424aa 28d0
e93 1565582659b1
70: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4a1e0 a606
3b4b 156558268669
71: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4ba63 be89
440b 156558268f29
72: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4d8e7 dd0d
4ef1 156558269a0f
73: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4d52c 371d6
13aef 15655827223a
74: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4ebec 38896
1430f 156558272a5a
I don't think that pvclock_clocksource_read is receiving
completely random uninitialized data. The values in shadow
are wrong, but could be interpreted as valid data
(shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff,
shadow.flags = 0 and shadow.version is always even).
I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
anyone needs them...
Move a printk that's using the clock before it's ready
Fix a hang during SMP kernel boot on KVM that showed up
after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
(2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
(2.6.34.1). The problem only occurs when
CONFIG_PRINTK_TIME is set.
Signed-off-by: Arjan Koers <[email protected]>
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..71bf2df 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -125,12 +125,15 @@ static struct clocksource kvm_clock = {
static int kvm_register_clock(char *txt)
{
int cpu = smp_processor_id();
- int low, high;
+ int low, high, ret;
+
low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
+ ret = native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
cpu, high, low, txt);
- return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
+
+ return ret;
}
#ifdef CONFIG_X86_LOCAL_APIC
Move a printk that's using the clock before it's ready
Fix a hang during SMP kernel boot on KVM that showed up
after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
(2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
(2.6.34.1). The problem only occurs when
CONFIG_PRINTK_TIME is set.
Signed-off-by: Arjan Koers <[email protected]>
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index eb9b76c..ca43ce3 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -128,13 +128,15 @@ static struct clocksource kvm_clock = {
static int kvm_register_clock(char *txt)
{
int cpu = smp_processor_id();
- int low, high;
+ int low, high, ret;
+
low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1;
high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
+ ret = native_write_msr_safe(msr_kvm_system_time, low, high);
printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
cpu, high, low, txt);
- return native_write_msr_safe(msr_kvm_system_time, low, high);
+ return ret;
}
#ifdef CONFIG_X86_LOCAL_APIC