On Wed, Dec 7, 2016 at 8:10 PM, Oved Ourfali <[email protected]> wrote: > On Dec 7, 2016 16:00, "Nir Soffer" <[email protected]> wrote: >> >> On Wed, Dec 7, 2016 at 10:17 AM, Oved Ourfali <[email protected]> wrote: >> > >> > >> > On Tue, Dec 6, 2016 at 11:12 PM, Adam Litke <[email protected]> wrote: >> >> >> >> On 06/12/16 22:06 +0200, Arik Hadas wrote: >> >>> >> >>> Adam, >> >> >> >> >> >> :) You seem upset. Sorry if I touched on a nerve... >> >> >> >>> Just out of curiosity: when you write "v2v has promised" - what >> >>> exactly >> >>> do you >> >>> mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and >> >>> I >> >>> that >> >>> implemented the integration with virt-v2v? I'm not aware of such a >> >>> promise by >> >>> any of these options :) >> >> >> >> >> >> Some history... >> >> >> >> Earlier this year Nir, Francesco (added), Shahar, and I began >> >> discussing the similarities between what storage needed to do with >> >> external commands and what was designed specifically for v2v. I am >> >> not sure if you were involved in the project at that time. The plan >> >> was to create common infrastructure that could be extended to fit the >> >> unique needs of the verticals. The v2v code was going to be moved >> >> over to the new infrastructure (see [1]) and the only thing that >> >> stopped the initial patch was lack of a VMWare testing environment for >> >> verification. >> >> >> >> At that time storage refocused on developing verbs that used the new >> >> infrastructure and have been maintaining its suitability for general >> >> use. Conversion of v2v -> Host Jobs is obviously a lower priority >> >> item and much more difficult now due to the early missed opportunity. >> >> >> >>> Anyway, let's say that you were given such a promise by someone and >> >>> thus >> >>> consider that mechanism to be deprecated - it doesn't really matter. >> >> >> >> >> >> I may be biased but I think my opinion does matter. >> >> >> >>> The current implementation doesn't well fit to this flow (it requires >> >>> per-volume job, it creates leases that are not needed for template's >> >>> disks, >> >>> ...) and with the "next-gen API" with proper support for virt flows >> >>> not >> >>> even >> >>> being discussed with us (and iiuc also not with the infra team) yet, I >> >>> don't >> >>> understand what do you suggest except for some strong, though >> >>> irrelevant, >> >>> statements. >> >> >> >> >> >> If you are willing to engage in a good-faith technical discussion I am >> >> sure I can help you to understand. These operations to storage demand >> >> some form of locking protection. If volume leases aren't appropriate >> >> then >> >> perhaps we should use the VM Leases / xleases that Nir is finishing >> >> off for 4.1 now. >> >> >> >>> I suggest loud and clear to reuse (not to add dependencies, not to >> >>> enhance, ..) >> >>> an existing mechanism for a very similar flow of virt-v2v that works >> >>> well >> >>> and >> >>> simple. >> >> >> >> >> >> I clearly remember discussions involving infra (hello Oved), virt >> >> (hola Michal), and storage where we decided that new APIs performing >> >> async operations involving external commands should use the HostJobs >> >> infrastructure instead of adding more information to Host Stats. >> >> These were the "famous" entity polling meetings. >> >> We discussed these issues behind close doors, not in the public mailing >> list, >> so it is not surprising that people do not know about the agreements we >> had. >> > > The core team was there. So it is surprising. > >> >> >> >> Of course plans can change but I have never been looped into any such >> >> discussions. >> >> >> > >> > Well, I think that when someone builds a good infrastructure he first >> > needs >> > to talk to all consumers and make sure it fits. >> > In this case it seems like most work was done to fit the storage >> > use-case, >> > and now you check whether it can fit others as well.... >> >> The jobs framework is generic and can be used for any subsystem, >> there is nothing related to storage about it. But modifying disks *is* >> a storage operation, even if someone from the virt team worked on it. >> >> V2v is also storage operation - if we compare it with copying disks: >> >> - we create a new volume that nobody is using yet >> - if the operation fails, the disk must be in illegal state >> - if the operation fails we delete the disks >> - if the operation succeeds the volume must be legal >> - we need to limit the number of operations on a host >> - we need to detect the job state if the host becomes non-responsive >> - we may want to fence the job if the host becomes non-responsive >> in volume jobs, we can increment the volume generation and run >> the same job on another host. >> - we want to take a lease on storage to ensure that other hosts cannot >> access the same entity, or that the job will fail if someone else is >> using >> this entity >> - we want to take a lease on storage, ensuring that a job cannot get >> stuck for long time - sanlock kill the owner of a lease when storage >> becomes inaccessible. >> - we want to report progress >> >> sysprep is less risky because the operation is faster, but on storage even >> fast operation can get stuck for minutes. >> >> We need to agree on a standard way to do such operations that is safe >> enough >> and can be managed on the engine side. >> >> > IMO it makes much more sense to use events where possible (and you've >> > promised to use those as well, but I don't see you doing that...). v2v >> > should use events for sure, and they have promised to do that in the >> > past, >> > instead of using the v2v jobs. The reason events weren't used originally >> > with the v2v feature, was that it was too risky and the events >> > infrastructure was added too late in the game. >> >> Events are not replacing the need for managing jobs in the vdsm side. >> Engine must have a way to query the current jobs before subscribing >> to events from these jobs, otherwise you will loose events and engine >> will never notice a completed job after network errors. >> >> The jobs framework supports events, see >> https://gerrit.ovirt.org/67118 >> >> We are waiting for review from the infra team, maybe you can >> get someone to review this? > > It would have been great to review the design for this before it reaches to > gerrit. > Anyway, I get permissions error when opening. Any clue why?
It is a recent bug in gerrit, or configuration issue, drafts are private sometimes. I added you as reviewer, can you see this now? Nir > >> >> Nir >> >> > >> > >> >>> >> >>> Do you "promise" to implement your "next gen API" for 4.1 as an >> >>> alternative? >> >> >> >> >> >> I guess we need the design first. >> >> >> >> >> >>> On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <[email protected]> wrote: >> >>> >> >>> On 05/12/16 11:17 +0200, Arik Hadas wrote: >> >>> >> >>> >> >>> >> >>> On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer >> >>> <[email protected]> >> >>> wrote: >> >>> >> >>> On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud >> >>> <[email protected]> >> >>> wrote: >> >>> > >> >>> > Hi! >> >>> > >> >>> > I'm currently working on integration of virt-sysprep into >> >>> oVirt. >> >>> > >> >>> > Usually, if user creates a template from a regular VM, and >> >>> then >> >>> creates >> >>> new VMs from this template, these new VMs inherit all >> >>> configuration >> >>> of the >> >>> original VM, including SSH keys, UDEV rules, MAC addresses, >> >>> system >> >>> ID, >> >>> hostname etc. It is unfortunate, because you cannot have two >> >>> network >> >>> devices with the same MAC address in the same network, for >> >>> example. >> >>> > >> >>> > To avoid this, user must clean all machine-specific >> >>> configuration >> >>> from >> >>> the original VM before creating a template from it. You can >> >>> do >> >>> this >> >>> manually, but there is virt-sysprep utility that does this >> >>> automatically. >> >>> > >> >>> > Ideally, virt-sysprep should be seamlessly integrated into >> >>> template >> >>> creation process. But the first step is to create a simple >> >>> button: >> >>> user >> >>> selects a VM, clicks the button and oVirt executes >> >>> virt-sysprep >> >>> on >> >>> the VM. >> >>> > >> >>> > virt-sysprep works directly on VM's filesystem. It accepts >> >>> list of >> >>> all >> >>> disks of the VM as parameters: >> >>> > >> >>> > virt-sysprep -a disk1.img -a disk2.img -a disk3.img >> >>> > >> >>> > The architecture is as follows: command on the Engine side >> >>> runs a >> >>> job on >> >>> VDSM side and tracks its success/failure. The job on VDSM >> >>> side >> >>> runs >> >>> virt-sysprep. >> >>> > >> >>> > The question is how to implement the job correctly? >> >>> > >> >>> > I thought about using storage jobs, but they are designed >> >>> to >> >>> work >> >>> only >> >>> with a single volume, correct? >> >>> >> >>> New storage verbs are volume based. This make it easy to >> >>> manage >> >>> them on the engine side, and will allow parallelizing volume >> >>> operations >> >>> on single or multiple hosts. >> >>> >> >>> A storage volume job is using sanlock lease on the modified >> >>> volume >> >>> and volume generation number. If a host running pending jobs >> >>> becomes >> >>> non-responsive and cannot be fenced, we can detect the state >> >>> of >> >>> the job, fence the job, and start the job on another host. >> >>> >> >>> In the SPM task, if a host becomes non-responsive and cannot >> >>> be >> >>> fenced, the whole setup is stuck, there is no way to perform >> >>> any >> >>> storage operation. >> >>> > Is is possible to use them with operation that is >> >>> performed >> >>> on >> >>> multiple >> >>> volumes? >> >>> > Or, alternatively, is it possible to use some kind of 'VM >> >>> jobs' - >> >>> that >> >>> work on VM at whole? >> >>> >> >>> We can do: >> >>> >> >>> 1. Add jobs with multiple volumes leases - can make error >> >>> handling >> >>> very >> >>> complex. How do tell a job state if you have multiple >> >>> leases? >> >>> which >> >>> volume generation you use? >> >>> >> >>> 2. Use volume job using one of the volumes (the boot >> >>> volume?). >> >>> This >> >>> does >> >>> not protect the other volumes from modification but >> >>> engine >> >>> is >> >>> responsible >> >>> for this. >> >>> >> >>> 3. Use new "vm jobs", using a vm lease (should be available >> >>> this >> >>> week >> >>> on master). >> >>> This protects a vm during sysprep from starting the vm. >> >>> We still need a generation to detect the job state, I >> >>> think >> >>> we >> >>> can >> >>> use the sanlock >> >>> lease generation for this. >> >>> >> >>> I like the last option since sysprep is much like running a >> >>> vm. >> >>> > How v2v solves this problem? >> >>> >> >>> It does not. >> >>> >> >>> v2v predates storage volume jobs. It does not use volume >> >>> leases >> >>> and >> >>> generation >> >>> and does have any way to recover if a host running v2v >> >>> becomes >> >>> non-responsive >> >>> and cannot be fenced. >> >>> >> >>> It also does not use the jobs framework and does not use a >> >>> thread >> >>> pool for >> >>> v2v jobs, so it has no limit on the number of storage >> >>> operations on >> >>> a host. >> >>> >> >>> >> >>> Right, but let's be fair and present the benefits of v2v-jobs >> >>> as >> >>> well: >> >>> 1. it is the simplest "infrastructure" in terms of LOC >> >>> >> >>> >> >>> It is also deprecated. V2V has promised to adopt the richer Host >> >>> Jobs >> >>> API in the future. >> >>> >> >>> >> >>> 2. it is the most efficient mechanism in terms of interactions >> >>> between >> >>> the >> >>> engine and VDSM (it doesn't require new verbs/call, the data is >> >>> attached to >> >>> VdsStats; probably the easiest mechanism to convert to events) >> >>> >> >>> >> >>> Engine is already polling the host jobs API so I am not sure I >> >>> agree >> >>> with you here. >> >>> >> >>> >> >>> 3. it is the most efficient implementation in terms of >> >>> interaction >> >>> with >> >>> the >> >>> database (no date is persisted into the database, no polling is >> >>> done) >> >>> >> >>> >> >>> Again, we're already using the Host Jobs API. We'll gain >> >>> efficiency >> >>> by migrating away from the old v2v API and having a single, unified >> >>> approach (Host Jobs). >> >>> >> >>> >> >>> Currently we have 3 mechanisms to report jobs: >> >>> 1. VM jobs - that is currently used for live-merge. This >> >>> requires >> >>> the >> >>> VM entity >> >>> to exist in VDSM, thus not suitable for virt-sysprep. >> >>> >> >>> >> >>> Correct, not appropriate for this application. >> >>> >> >>> >> >>> 2. storage jobs - complicated infrastructure, targeted for >> >>> recovering >> >>> from >> >>> failures to maintain storage consistency. Many of the things >> >>> this >> >>> infrastructure knows to handle is irrelevant for virt-sysprep >> >>> flow, and >> >>> the >> >>> fact that virt-sysprep is invoked on VM rather than particular >> >>> disk >> >>> makes it >> >>> less suitable. >> >>> >> >>> >> >>> These are more appropriately called HostJobs and the have the >> >>> following semantics: >> >>> - They represent an external process running on a single host >> >>> - They are not persisted. If the host or vdsm restarts, the job is >> >>> aborted >> >>> - They operate on entities. Currently storage is the first adopter >> >>> of the infrastructure but virt was going to adopt these for the >> >>> next-gen API. Entities can be volumes, storage domains, vms, >> >>> network interfaces, etc. >> >>> - Job status and progress is reported by the Host Jobs API. If a >> >>> job >> >>> is not present, then the underlying entitie(s) must be polled by >> >>> engine to determine the actual state. >> >>> >> >>> >> >>> 3. V2V jobs - no mechanism is provided to resume failed jobs, >> >>> no >> >>> leases, etc >> >>> >> >>> >> >>> This is the old infra upon which Host Jobs are built. v2v has >> >>> promised to move to Host Jobs in the future so we should not add >> >>> new >> >>> dependencies to this code. >> >>> >> >>> >> >>> I have some arguments for using V2V-like jobs [1]: >> >>> 1. creating template from vm is rarely done - if host goes >> >>> unresponsive >> >>> or any >> >>> other failure is detected we can just remove the template and >> >>> report >> >>> the error >> >>> >> >>> >> >>> We can chose this error handling with Host Jobs as well. >> >>> >> >>> >> >>> 2. the phase of virt-sysprep is, unlike typical storage >> >>> operation, >> >>> short - >> >>> reducing the risk of failures during the process >> >>> >> >>> >> >>> Reduced risk of failures is never an excuse to have lax error >> >>> handling. The storage flavored host jobs provide tons of utilities >> >>> for making error handling standardized, easy to implement, and >> >>> correct. >> >>> >> >>> >> >>> 3. during the operation the VM is down - by locking the >> >>> VM/template and >> >>> its >> >>> disks on the engine side, we render leases-like mechanism >> >>> redundant >> >>> >> >>> >> >>> Eventually we want to protect all operations on storage with >> >>> sanlock >> >>> leases. This is safer and allows for a more distributed approach >> >>> to >> >>> management. Again, the use of leases correctly in host jobs >> >>> requires >> >>> about 5 lines of code. The benefits of standardization far >> >>> outweigh >> >>> any perceived simplification resulting from omitting it. >> >>> >> >>> >> >>> 4. in the worst case - the disk will not be corrupted (only >> >>> some >> >>> of the >> >>> data >> >>> might be removed). >> >>> >> >>> >> >>> Again, the way engine chooses to handle job failures is independent >> >>> of >> >>> the mechanism. Let's separate that from this discussion. >> >>> >> >>> >> >>> So I think that the mechanism for storage jobs is an over-kill >> >>> for >> >>> this >> >>> case. >> >>> We can keep it simple by generalise the V2V-job for other >> >>> virt-tools >> >>> jobs, like >> >>> virt-sysprep. >> >>> >> >>> >> >>> I think we ought to standardize on the Host Jobs framework where we >> >>> can collaborate on unit tests, standardized locking and error >> >>> handling, abort logic, etc. When v2v moves to host jobs then we >> >>> will >> >>> have a unified method of handling ephemeral jobs that are tied to >> >>> entities. >> >>> >> >>> -- >> >>> Adam Litke >> >>> >> >>> >> >> >> >> -- >> >> Adam Litke >> > >> > _______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
