I like the ideas of 2-phase aggregated attestation & cluster-by-cluster order.
But I want to understand the process more clearly. Without TCP, how does engine handle the states of existing hosts during engine booting? Will engine put all existing hosts in non-operational state and then perform some check via VDSM then turn it into operational state? Put host in non-operational state will cause VM migration, right? Or there is a global state in engine to indicate whether user is allowed to create VM? Thanks Jimmy Itamar Heim wrote onĀ 2013-04-28: > On 04/28/2013 11:34 AM, Doron Fediuck wrote: >> Hi Dave, >> >> Just to make sure I fully understand, I'll repeat your basic arguments; >> >> 1. It takes time to query a big number of hosts (hundreds). >> >> 2. When backend is booting, a user may start a VM on a host which was >> hacked during the downtime of the engine. >> >> If the above is your concern, it shouldn't be so. >> The reason is, that no host will become operational before you get a response >> from the attestation server and allow it to become operational. So a user >> cannot start a new VM on a non-operational host. > > i'd do the queries in groups of "cluster", so cluste-by-cluster they get > unblocked. cluster without attestation service shouldn't block on this > of course. > >> >> What this means is that your thread may need to update the user by sending >> a periodic event that a large scale attestation operation is in progress. >> Other than that, maybe your thread can work in smaller groups if it gets >> better results? ie- instead of one query for 300 hosts, maybe you can run >> 3 serialized queries for 100 hosts each? >> If this does not help, maybe you can run a short query for something like >> 10 hosts, which should get an answer relatively fast. The you can issue a >> query for the other 290 hosts which will take longer. In this way the system >> may get 10 hosts to work with quite fast, and later on the other 290 hosts >> will join... So this can actually be configurable to a 2-phase process; >> a short query and a longer one. The admin can choose the short query size >> based on his setup, and the longer query can pick up all the other hosts. >> What do you think? >> >> Doron >> >> ----- Original Message ----- >>> From: "Wei D Chen" <[email protected]> To: "Doron Fediuck" >>> <[email protected]> Cc: "Oved Ourfalli" <[email protected]>, >>> [email protected] Sent: Saturday, April 27, 2013 9:36:44 AM >>> Subject: Re: [Engine-devel] Design wiki page for trusted compute pools >>> integration with oVirt has been updated >>> >>> Hi, >>> >>> Our current consideration is add a new thread in engine's side to >>> attest all of hosts (aggregated query from attestation sever) one time >>> in case of engine's rebooting. There is still one potential issue >>> under extreme condition, saying, hundreds of nodes in a datacenter, >>> attest all of hosts still may take couple of mins, let's say, one >>> hacked untrusted node before receiving the latest status may >>> considered as a trusted host, so, the worst case in a datacenter with >>> hundreds of nodes is, 1. engine is down for some reasons and boot up >>> again, some trusted nodes may be hacked and rebooted during this >>> period. 2. our thread is running to get all of node's status (trust >>> /untrusted), may take couple of mins in large datacenter. 2. user >>> create VMs on these hacked nodes and believe these VMs are trusted VMs >>> launched on trusted nodes. 3. our thread get the correct status of >>> these untrusted nodes, set these nodes as non-operational. 4. all of >>> these "trusted" VMs running on these untrusted nodes are expected to >>> migrate to other trusted node. >>> >>> So, the question is in a trusted cluster with hundreds of nodes some >>> VMs expected to create on trusted nodes may actually create on >>> untrusted nodes instead, and this time may last for couple of mins. >>> (worst case in my view is 10 mins with 1000 nodes). Does this >>> acceptable from your point of view? Or any other suggestion? >>> >>> >>> Best Regards, >>> Dave Chen >>> >>> >>> Doron Fediuck wrote on 2013-04-21: >>>> integration with oVirt has been updated >>>> >>>> >>>> >>>> ----- Original Message ----- >>>>> From: "Wei D Chen" <[email protected]> >>>>> To: "Ofri Masad" <[email protected]> >>>>> Cc: "Oved Ourfalli" <[email protected]>, [email protected] >>>>> Sent: Sunday, April 21, 2013 4:00:55 PM >>>>> Subject: Re: [Engine-devel] Design wiki page for trusted compute pools >>>>> integration with oVirt has been updated >>>>> >>>>> Ofri, >>>>> >>>>> Absolutely right, aggregated query has a significantly time improve >>>>> compared to separated queries. I agree a aggregated query on >>>>> engine's starting. Is it possible to invoke attestation service in >>>>> engine's initialization code block instead of "quartz job"? Is there >>>>> any class similar with " InitVdsOnUpCommand " for engine's >>>>> initialization? >>>>> >>>>> Best Regards, >>>>> Dave Chen >>>>> >>>> org.ovirt.engine.core.bll.Backend.Initialize() >>>> >>>> Note you cannot block this method while waiting for results. Instead >>>> I suggest you fire a one-time background request from this method. >>>> >>>> >>>> Ofri Masad wrote on 2013-04-21: >>>>> integration with oVirt has been updated >>>>> >>>>> Dave, >>>>> >>>>> If I'm not mistaking, there is a big difference between separated >>>>> queries to the attestation server and aggregated one? >>>>> Is it true? >>>>> >>>>> Thanks, >>>>> Ofri >>>>> >>>>> ----- Original Message ----- >>>>>> From: "Itamar Heim" <[email protected]> >>>>>> To: "Ofri Masad" <[email protected]> >>>>>> Cc: "Oved Ourfalli" <[email protected]>, "Wei D Chen" >>>>>> <[email protected]>, [email protected] >>>>>> Sent: Sunday, April 21, 2013 10:20:17 AM >>>>>> Subject: Re: [Engine-devel] Design wiki page for trusted compute >>>>>> pools integration with oVirt has been updated >>>>>> >>>>>> On 04/21/2013 10:13 AM, Ofri Masad wrote: >>>>>>> Hi, >>>>>>> One more thing we need to think about for the second approach - >>>>>>> aggregated query. On engine start we need to determine the trust >>>>>>> state of all the hosts. sending a separate query for each host >>>>>>> will overload the attestation host and the network. an initial >>>>>>> aggregated query needs to be send when the engine starts. >>>>>>> Same thing can happen after management network fail and so on. >>>>>>> Maybe we can run a quartz job every x minutes, checking if a large >>>>>>> part of the hosts in the cluster (like 30%) are untrusted - in >>>>>>> that case run the aggregated query. >>>>>> >>>>>> are we sure this optimization is needed? >>>>>> how heavy/latent is the call to the attestation service? >>>>>> >>>>> _______________________________________________ >>>>> Engine-devel mailing list >>>>> [email protected] >>>>> http://lists.ovirt.org/mailman/listinfo/engine-devel >>>>> >>> _______________________________________________ >>> Engine-devel mailing list >>> [email protected] >>> http://lists.ovirt.org/mailman/listinfo/engine-devel >>> >> _______________________________________________ >> Engine-devel mailing list >> [email protected] >> http://lists.ovirt.org/mailman/listinfo/engine-devel >> > > _______________________________________________ > Engine-devel mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/engine-devel Jimmy
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Engine-devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/engine-devel
