Hi there, Alex Huang and I are targeting to finish the debugging process on VMsync improvement by the end of this week. I'd like to encourage those who are interested or having concerns about this project to review it as soon as possible on branch vmsync. We are going to propose the official merge request soon after it has passed our internal test.
Some details have been posted to wiki a while ago at https://cwiki.apache.org/confluence/display/CLOUDSTACK/FS+-+VMSync+improvement. Here is a summary about this change. 1. We changed the underlying VM state sync modeling. For those who are familiar with the old Microsoft COM model, they may recall "Free threading model" and "Thread-apartment model", VM state sync modeling change is similar to switching from free-threading to thread-apartment modeling. Previously, VM state changes are reported and processed in management server in a "free-threading" fashion, regardless whether or not there is active process with the subject VM, the state sync process is always executed in place. This approach has issues with the concurrency complexity by nature, since all sync-process has been concentrated into one place and caused complex code logic that is hard to change and maintain. A major modeling shift is introduced in this change, we now switch to an approach which we can call it "job-apartment" model, comparable to Microsoft's COM "thread-apartment model", that is, making the sync logic within the process context and de-centralize it across the board. This approach can simplify VM state sync logic individually and leave the complexity to underlying framework, which in the future, the framework can be optimized separately without affecting business layer (separating of concerns at architecture level) 2. De-couple hypervisor resource agent from managing VM state in Cloud layer We also changed the way on how resource agent is involved in the overall VM state sync process. Previously, resource agent needs to participate VM state management in the Cloud layer closely, this requirement is removed and resource agent is no longer required to help maintain "delta" state in the overall VM state management, all it needs is to report what it knows about the VM state at virtualization layer, leaving all the handling to CloudStack management server. The reason for this change is to simplify the architecture between agent resource and management server, de-coupling in this way can lower the requirement for developers to write a new hypervisor resource agent and also give room for management server developers to optimize sync logic independently. (Again, separating of concerns at architecture level) 3. Job framework has been improved To make the proposal possible, job framework has been refactored to support more explicit management of jobs, job joining, wake-up scheduling and serializing job execution has been added together with a topic-based message bus facility. 4. Compile-time strong typing of Java generic usage in VirtualMachineManagerImpl Job scheduling change require more flexible run-time handling, however, previously VirtualMachineManagerImpl has a heavy-weight usage of Java generic to take advantage of compile-time strong typing provided by Java, this has brought some troubles with object serialization the occurs between boundaries of "job-apartments", VirtualMachineManagerImpl has been refactored because of that. Flames and Comment? all are welcome. Kelven