Dong, Eddie wrote:
> Avi Kivity wrote:
>   
>> Dong, Eddie wrote:
>>     
>>> Avi Kivity wrote:
>>>
>>>       
>
>   
>> It's just like the guest kernel executing hlt.  Why is there a
>> difference? 
>>     
>
> Current VP wake up logic thru INIT/SIPI doesn't support this when
> irqchip in kernel. 
>
>   

Doesn't this code imply that waiting for SIPI is supported?

> static void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> {
>     DECLARE_WAITQUEUE(wait, current);
>
>     add_wait_queue(&vcpu->wq, &wait);
>
>     /*
>      * We will block until either an interrupt or a signal wakes us up
>      */
>     while (!kvm_cpu_has_interrupt(vcpu)
>            && !signal_pending(current)
>            && vcpu->mp_state != VCPU_MP_STATE_RUNNABLE
>            && vcpu->mp_state != VCPU_MP_STATE_SIPI_RECEIVED) {
(note waiting for SIPI here)

>         set_current_state(TASK_INTERRUPTIBLE);
>         vcpu_put(vcpu);
>         schedule();
>         vcpu_load(vcpu);
>     }
>
>     __set_current_state(TASK_RUNNING);
>     remove_wait_queue(&vcpu->wq, &wait);
> }



>>> Yes, halt all APs and let BSP do reset ops in user level.
>>> Will post patch to Qemu to support SMP reboot some time later.
>>>
>>>
>>>       
>> Wait, that's a big change.  Need to think about this...
>>     
>
> Like talked in previous, we either let BSP do this, or let an AP
> become BSP. Current Qemu doesn't support this.
>
>   
>>> -   if (vcpu->requests)
>>> +   if (vcpu->requests) {
>>>             if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
>>>                     kvm_x86_ops->tlb_flush(vcpu);
>>> +           if (test_and_clear_bit(KVM_FROZEN, &vcpu->requests)) {
>>> +                   local_irq_enable(); +
>>>       
> preempt_enable();
>   
>>> +                   r = -EINTR;
>>> +                   kvm_run->exit_reason = KVM_EXIT_FROZEN;
>>> +                   goto out;
>>> +           }
>>> +   }
>>>
>>>       
>> Why not just call vcpu_reset() here, then call vcpu_halt() if
>> we're an AP?
>>     
>
> Then you need to duplicate following code, which may bring extra issue
> such as GUEST_CR3 update (kvm_mmu_reload) which is done after 
> this code but before vcpu->requests test. BTW, switch back to user level
> to retry is cleaner IMO.
>
>         if (unlikely(vcpu->mp_state == VCPU_MP_STATE_UNINITIALIZED)) {
>                 if (irqchip_in_kernel(vcpu->kvm) && vcpu->apic)
>                         kvm_lapic_reset(vcpu);
>                 kvm_vcpu_block(vcpu);
>                 vcpu_put(vcpu);
>                 return -EAGAIN;
>         }
>
>   

You can put a goto to the top of the loop to redo the mmu reload.  In 
any case you need to do that because you don't want to execute the reset 
code with interrupts and preemption disabled.

>>> +                   while (test_bit(KVM_FROZEN, &vcpu->requests))
>>> +                           schedule();
>>>
>>>       
>> I don't think we need to wait here.
>>     
>
> The VCPU may be executing in kernel still, which may modify kernel
> device state. E.g. A VCPU may be doing PIO emulating.
>
>   

In that case we will wait when taking kvm->lock.

> If BSP reset the kernel devices earlier than the VCPU modify the device
> state,
> we are in trouble.
>   

Right, so the logic is:

- send IPI to all vcpus to stop them running
- take the lock and reset all devices, srop the lock
- unblock vcpu 0

>   
>>> +           }
>>> +           else {
>>> +                   vcpu->mp_state = VCPU_MP_STATE_RUNNABLE;
>>> +                   kvm_lapic_reset(vcpu);
>>> +           }
>>> +   }
>>> +   /* Now only BSP is running... */
>>> +   kvm_reset_devices(kvm);
>>>
>>>       
>> But now you're reseting the devices while vcpu 0 may be
>> running.  If in
>>     
>
> No, VCPU0 (BSP) is current VCPU (though you don't have the current
> vcpu parameter explicitly) like mentioned in previous mail and
> as pre-requirement of user level change. Please refer my abswer above
> of this mail.
>
>   

We can't rely on user space not to cause host kernel corruption.

>> the first stage you halt all vcpus, and then restart vcpu 0 after the
>> reset, you avoid the race. 
>>
>>     
>
> No race here. The code is executing in VCPU0 context.
>
>   

It isn't guaranteed.

It's okay to rely on it for guest correctness (though undesirable), not 
host correctness.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to