Hi one more time. I will refactor DB layer tomorrow. As I said I don't want to be a block.
Best regards, Boris Pavlovic --- Mirantis Inc. On Mon, Jul 22, 2013 at 11:08 PM, Boris Pavlovic <[email protected]> wrote: > Ian, > > I don't like to write anything personally. > But I have to write some facts: > > 1) I see tons of hands and only 2 solutions my and one more that is based > on code. > 2) My code was published before session (18. Apr 2013) > 3) Blueprints from summit were published (03. Mar 2013) > 4) My Blueprints were published (25. May 2013) > 5) Patches based on my patch were published only (5. Jul 2013) > > > 2 All, > > I don't won't to hide anything from community, cores or PTLs. I have only > one goal and it is to make OpenStack better. > > Recently I get new task on my job: Scalability/Performance and Benchmarks > > So with my colleagues we started investigating some code around scheduler. > (Jiang sorry for lag in 2 weeks) > > After making investigations and tests we found that one of the reason why > scheduler works slow and has problems with scalability is work with DB. > JOINS are pretty unscalable and slow thing and if we add one more JOIN > that is required by PCI passthrough we will get much worse situation. > > We started investigating how to solve two competing things: Scalability vs > Flexibility > About flexibility: > I don't think that our current scheduler is handy, I have to add 800 lines > of code just to be able to use list of JSON objects in scheduler as one > more resource (with complex structure). If we don't use new table we should > use some kind of key/value, and if we use a lot of key/values we will get > problems with performance and scalability or if we store in one key/value > we will get another problem with races and tons of dirty code. So we will > get the same problems in future. Also using of resources from different > providers (such as cinder) are pretty hard tasks. > > So Alexey K., Alexei O. and me found a way to make our scheduler work > without DB with pretty small changes in current solution. > New approach allows us in the same time to have scalability and > flexibility. > What means scalability: "We don't need to store anything about PCI devices > in DB". And should just add some small extra code in resource tracker. > > I understand that it is too late to implement such things in H-3 (I > absolutely agree with Russell). (Even if they require just 100-150 lines of > code.) > > So if we implement solution based on my current code, after improving > scheduler we should: > 1) remove absolutly DB layer > 2) 100% replace compute.api layer > 3) partial replace scheduler layer > 4) change compute.manager > And only libvirt (that should be improved) and auto discovering will be > untouched (but they are not enough reviewed in this moment) will be > untouched. > > > So I really don't understand why we should hurry up. Why we are not able > firstly to: > 1) Prepare patches around improving scheduler (before summit) > 2) Prepare all things that will be untouched (libvirt/auto discovering) > (before summit) > 3) Speak about all this stuff one more time on summit > 4) Improve and implement all these work in I-1 ? > 5) Test and improve it during I-2 and I-3. > > I think that it will be much better for OpenStack code at all. > > If Russell and other cores would like to implement current PCI passthrough > approach anyway. > I won't block anything and tomorrow at evening will be finished DB layer > > > Best regards, > Boris Pavlovic > --- > Mirantis Inc. > > > > > On Mon, Jul 22, 2013 at 8:49 PM, Ian Wells <[email protected]> wrote: > >> Per the last summit, there are many interested parties waiting on PCI >> support. Boris (who unfortunately waasn't there) jumped in with an >> implementation before the rest of us could get a blueprint up, but I >> suspect he's been stretched rather thinly and progress has been much >> slower than I was hoping it would be. There are many willing hands >> happy to take this work on; perhaps it's time we did, so that we can >> get something in before Havana. >> >> I'm sure we could use a better scheduler. I don't think that actually >> affects most of the implementation of passthough and I don't think we >> should tie the two together. "The perfect is the enemy of the good." >> >> And as far as the quantity of data passed back - we've discussed >> before that it would be nice (for visibility purposes) to be able to >> see an itemised list of all of the allocated and unallocated PCI >> resources in the database. There could be quite a lot per host (256 >> per card x say 10 cards depending on your hardware). But passing that >> itemised list back is somewhat of a luxury - in practice, what you >> need for scheduling is merely a list of categories of card (those >> pools where any one of the PCI cards in the pool would do) and counts. >> The compute node should be choosing a card from the pool in any case. >> The scheduler need only find a machine with cards available. >> >> I'm not totally convinced that passing back the itemised list is >> necessarily a problem, but in any case we can make the decision one >> way or the other, take on the risk if we like, and get the code >> written - if it turns out not to be scalable then we can fix *that* in >> the next release, but at least we'll have something to play with in >> the meantime. Delaying the whole thing to I is just silly. >> -- >> Ian. >> >> On 22 July 2013 17:34, Jiang, Yunhong <[email protected]> wrote: >> > As for the scalability issue, boris, are you talking about the VF >> number issue, i.e. A physical PCI devices can at most have 256 virtual >> functions? >> > >> > I think we have discussed this before. We should keep the compute node >> to manage the same VF functions, so that VFs belongs to the same PF will >> have only one entry in DB, with a field indicating the number of free VFs. >> Thus there will be no scalability issue because the number of PCI slot is >> limited. >> > >> > We didn't implement this mechanism on current patch set because we >> agree to make it a enhancement. If it's really a concern, please raise it >> and we will enhance our resource tracker for this. That's not complex task. >> > >> > Thanks >> > --jyh >> > >> >> -----Original Message----- >> >> From: Russell Bryant [mailto:[email protected]] >> >> Sent: Monday, July 22, 2013 8:22 AM >> >> To: Jiang, Yunhong >> >> Cc: [email protected]; [email protected] >> >> Subject: Re: The PCI support blueprint >> >> >> >> On 07/22/2013 11:17 AM, Jiang, Yunhong wrote: >> >> > Hi, Boris >> >> > I'm a surprised that you want to postpone the PCI support >> >> (https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base) to >> I >> >> release. You and our team have been working on this for a long time, >> and >> >> the patches has been reviewed several rounds. And we have been waiting >> >> for your DB layer patch for two weeks without any update. >> >> > >> >> > Can you give more reason why it's pushed to I release? If you >> are out >> >> of bandwidth, we are sure to take it and push it to H release! >> >> > >> >> > Is it because you want to base your DB layer on your 'A simple >> way to >> >> improve nova scheduler'? That really does not make sense to me. >> Firstly, >> >> that proposal is still under early discussion and get several >> different voices >> >> already, secondly, PCI support is far more than DB layer, it includes >> >> resource tracker, scheduler filter, libvirt support enhancement etc. >> Even if >> >> we will change the scheduler that way after I release, we need only >> >> change the DB layer, and I don't think that's a big effort! >> >> >> >> Boris mentioned scalability concerns with the current approach on IRC. >> >> I'd like more detail. >> >> >> >> In general, if we can see a reasonable path to upgrade what we have now >> >> to make it better in the future, then we don't need to block it because >> >> of that. If the current approach will result in a large upgrade impact >> >> to users to be able to make it better, that would be a reason to hold >> >> off. It also depends on how serious the scalability concerns are. >> >> >> >> -- >> >> Russell Bryant >> > >> > _______________________________________________ >> > OpenStack-dev mailing list >> > [email protected] >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> _______________________________________________ >> OpenStack-dev mailing list >> [email protected] >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
