[openstack-dev] [nova] Can we deprecate the server backup API please?

2018-11-16 Thread Jay Pipes
The server backup API was added 8 years ago. It has Nova basically 
implementing a poor-man's cron for some unknown reason (probably because 
the original RAX Cloud Servers API had some similar or identical 
functionality, who knows...).


Can we deprecate this functionality please? It's confusing for end users 
to have an `openstack server image create` and `openstack server backup 
create` command where the latter does virtually the same thing as the 
former only sets up some whacky cron-like thing and deletes images after 
some number of rotations.


If a cloud provider wants to offer some backup thing as a service, they 
could implement this functionality separately IMHO, store the user's 
requested cronjob state in their own system (or in glance which is kind 
of how the existing Nova createBackup functionality works), and run a 
simple cronjob executor that ran `openstack server image create` and 
`openstack image delete` as needed.


This is a perfect example of an API that should never have been added to 
the Compute API, in my opinion, and removing it would be a step in the 
right direction if we're going to get serious about cleaning the Compute 
API up.


Thoughts?
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

2018-11-04 Thread Jay Pipes

On 11/02/2018 03:22 PM, Eric Fried wrote:

All-

Based on a (long) discussion yesterday [1] I have put up a patch [2]
whereby you can set [compute]resource_provider_association_refresh to
zero and the resource tracker will never* refresh the report client's
provider cache. Philosophically, we're removing the "healing" aspect of
the resource tracker's periodic and trusting that placement won't
diverge from whatever's in our cache. (If it does, it's because the op
hit the CLI, in which case they should SIGHUP - see below.)

*except:
- When we initially create the compute node record and bootstrap its
resource provider.
- When the virt driver's update_provider_tree makes a change,
update_from_provider_tree reflects them in the cache as well as pushing
them back to placement.
- If update_from_provider_tree fails, the cache is cleared and gets
rebuilt on the next periodic.
- If you send SIGHUP to the compute process, the cache is cleared.

This should dramatically reduce the number of calls to placement from
the compute service. Like, to nearly zero, unless something is actually
changing.

Can I get some initial feedback as to whether this is worth polishing up
into something real? (It will probably need a bp/spec if so.)

[1]
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03
[2] https://review.openstack.org/#/c/614886/

==
Background
==
In the Queens release, our friends at CERN noticed a serious spike in
the number of requests to placement from compute nodes, even in a
stable-state cloud. Given that we were in the process of adding a ton of
infrastructure to support sharing and nested providers, this was not
unexpected. Roughly, what was previously:

  @periodic_task:
  GET /resource_providers/$compute_uuid
  GET /resource_providers/$compute_uuid/inventories

became more like:

  @periodic_task:
  # In Queens/Rocky, this would still just return the compute RP
  GET /resource_providers?in_tree=$compute_uuid
  # In Queens/Rocky, this would return nothing
  GET /resource_providers?member_of=...=MISC_SHARES...
  for each provider returned above:  # i.e. just one in Q/R
  GET /resource_providers/$compute_uuid/inventories
  GET /resource_providers/$compute_uuid/traits
  GET /resource_providers/$compute_uuid/aggregates

In a cloud the size of CERN's, the load wasn't acceptable. But at the
time, CERN worked around the problem by disabling refreshing entirely.
(The fact that this seems to have worked for them is an encouraging sign
for the proposed code change.)

We're not actually making use of most of that information, but it sets
the stage for things that we're working on in Stein and beyond, like
multiple VGPU types, bandwidth resource providers, accelerators, NUMA,
etc., so removing/reducing the amount of information we look at isn't
really an option strategically.


I support your idea of getting rid of the periodic refresh of the cache 
in the scheduler report client. Much of that was added in order to 
emulate the original way the resource tracker worked.


Most of the behaviour in the original resource tracker (and some of the 
code still in there for dealing with (surprise!) PCI passthrough devices 
and NUMA topology) was due to doing allocations on the compute node (the 
whole claims stuff). We needed to always be syncing the state of the 
compute_nodes and pci_devices table in the cell database with whatever 
usage information was being created/modified on the compute nodes [0].


All of the "healing" code that's in the resource tracker was basically 
to deal with "soft delete", migrations that didn't complete or work 
properly, and, again, to handle allocations becoming out-of-sync because 
the compute nodes were responsible for allocating (as opposed to the 
current situation we have where the placement service -- via the 
scheduler's call to claim_resources() -- is responsible for allocating 
resources [1]).


Now that we have generation markers protecting both providers and 
consumers, we can rely on those generations to signal to the scheduler 
report client that it needs to pull fresh information about a provider 
or consumer. So, there's really no need to automatically and blindly 
refresh any more.


Best,
-jay

[0] We always need to be syncing those tables because those tables, 
unlike the placement database's data modeling, couple both inventory AND 
usage in the same table structure...


[1] again, except for PCI devices and NUMA topology, because of the 
tight coupling in place with the different resource trackers those types 
of resources use...



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] We're combining the lists!

2018-10-29 Thread Jay Pipes

I'm not willing to subscribe with a password over a non-TLS connection...

-jay

On 10/29/2018 12:53 PM, Jeremy Stanley wrote:

REMINDER: The openstack, openstack-dev, openstack-sigs and
openstack-operators mailing lists (to which this is being sent) will
be replaced by a new openstack-disc...@lists.openstack.org mailing
list. The new list is open for subscriptions[0] now, but is not yet
accepting posts until Monday November 19 and it's strongly
recommended to subscribe before that date so as not to miss any
messages posted there. The old lists will be configured to no longer
accept posts starting on Monday December 3, but in the interim posts
to the old lists will also get copied to the new list so it's safe
to unsubscribe from them any time after the 19th and not miss any
messages. See my previous notice[1] for details.

For those wondering, we have 127 subscribers so far on
openstack-discuss with 3 weeks to go before it will be put into use
(and 5 weeks now before the old lists are closed down for good).

[0] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss
[1] 
http://lists.openstack.org/pipermail/openstack-dev/2018-September/134911.html



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-26 Thread Jay Pipes

On 10/25/2018 02:44 PM, melanie witt wrote:

On Thu, 25 Oct 2018 14:00:08 -0400, Jay Pipes wrote:

On 10/25/2018 01:38 PM, Chris Friesen wrote:

On 10/24/2018 9:10 AM, Jay Pipes wrote:

Nova's API has the ability to create "quota classes", which are
basically limits for a set of resource types. There is something
called the "default quota class" which corresponds to the limits in
the CONF.quota section. Quota classes are basically templates of
limits to be applied if the calling project doesn't have any stored
project-specific limits.

Has anyone ever created a quota class that is different from "default"?


The Compute API specifically says:

"Only ‘default’ quota class is valid and used to set the default quotas,
all other quota class would not be used anywhere."

What this API does provide is the ability to set new default quotas for
*all* projects at once rather than individually specifying new defaults
for each project.


It's a "defaults template", yes.

The alternative is, you know, to just set the default values in
CONF.quota, which is what I said above. Or, if you want project X to
have different quota limits from those CONF-driven defaults, then set
the quotas for the project to some different values via the
os-quota-sets API (or better yet, just use Keystone's /limits API when
we write the "limits driver" into Nova). The issue is that the
os-quota-classes API is currently blocking *me* writing that "limits
driver" in Nova because I don't want to port nova-specific functionality
(like quota classes) to a limits driver when the Keystone /limits
endpoint doesn't have that functionality and nobody I know of has ever
used it.


When you say it's blocking you from writing the "limits driver" in nova, 
are you saying you're picking up John's unified limits spec [1]? It's 
been in -W mode and hasn't been updated in 4 weeks. In the spec, 
migration from quota classes => registered limits and deprecation of the 
existing quota API and quota classes is described.


Cheers,
-melanie

[1] https://review.openstack.org/602201


Actually, I wasn't familiar with John's spec. I'll review it today.

I was referring to my own attempts to clean up the quota system and 
remove all the limits-related methods from the QuotaDriver class...


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-25 Thread Jay Pipes

On 10/25/2018 01:38 PM, Chris Friesen wrote:

On 10/24/2018 9:10 AM, Jay Pipes wrote:
Nova's API has the ability to create "quota classes", which are 
basically limits for a set of resource types. There is something 
called the "default quota class" which corresponds to the limits in 
the CONF.quota section. Quota classes are basically templates of 
limits to be applied if the calling project doesn't have any stored 
project-specific limits.


Has anyone ever created a quota class that is different from "default"?


The Compute API specifically says:

"Only ‘default’ quota class is valid and used to set the default quotas, 
all other quota class would not be used anywhere."


What this API does provide is the ability to set new default quotas for 
*all* projects at once rather than individually specifying new defaults 
for each project.


It's a "defaults template", yes.

The alternative is, you know, to just set the default values in 
CONF.quota, which is what I said above. Or, if you want project X to 
have different quota limits from those CONF-driven defaults, then set 
the quotas for the project to some different values via the 
os-quota-sets API (or better yet, just use Keystone's /limits API when 
we write the "limits driver" into Nova). The issue is that the 
os-quota-classes API is currently blocking *me* writing that "limits 
driver" in Nova because I don't want to port nova-specific functionality 
(like quota classes) to a limits driver when the Keystone /limits 
endpoint doesn't have that functionality and nobody I know of has ever 
used it.


Chris, are you advocating for *keeping* the os-quota-classes API?

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-24 Thread Jay Pipes

On 10/24/2018 02:57 PM, Matt Riedemann wrote:

On 10/24/2018 10:10 AM, Jay Pipes wrote:
I'd like to propose deprecating this API and getting rid of this 
functionality since it conflicts with the new Keystone /limits 
endpoint, is highly coupled with RAX's turnstile middleware and I 
can't seem to find anyone who has ever used it. Deprecating this API 
and functionality would make the transition to a saner quota 
management system much easier and straightforward.


I was trying to do this before it was cool:

https://review.openstack.org/#/c/411035/

I think it was the Pike PTG in ATL where people said, "meh, let's just 
wait for unified limits from keystone and let this rot on the vine".


I'd be happy to restore and update that spec.


++

I think partly things have stalled out because maybe each side (keystone 
+ nova) think the other is working on something but isn't?


I'm currently working on cleaning up the quota system and would be happy 
to deprecate the os-quota-classes API along with the patch series that 
does that cleanup.


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][limits] Does ANYONE at all use the quota class functionality in Nova?

2018-10-24 Thread Jay Pipes
Nova's API has the ability to create "quota classes", which are 
basically limits for a set of resource types. There is something called 
the "default quota class" which corresponds to the limits in the 
CONF.quota section. Quota classes are basically templates of limits to 
be applied if the calling project doesn't have any stored 
project-specific limits.


Has anyone ever created a quota class that is different from "default"?

I'd like to propose deprecating this API and getting rid of this 
functionality since it conflicts with the new Keystone /limits endpoint, 
is highly coupled with RAX's turnstile middleware and I can't seem to 
find anyone who has ever used it. Deprecating this API and functionality 
would make the transition to a saner quota management system much easier 
and straightforward.


Also, I'm apparently blocked now from the operators ML so could someone 
please forward this there?


Thanks,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [FEMDC] [Edge] [tripleo] On the use of terms "Edge" and "Far Edge"

2018-10-18 Thread Jay Pipes

On 10/18/2018 10:23 AM, Dmitry Tantsur wrote:

Hi all,

Sorry for chiming in really late in this topic, but I think $subj is 
worth discussing until we settle harder on the potentially confusing 
terminology.


I think the difference between "Edge" and "Far Edge" is too vague to use 
these terms in practice. Think about the "edge" metaphor itself: 
something rarely has several layers of edges. A knife has an edge, there 
are no far edges. I imagine zooming in and seeing more edges at the 
edge, and then it's quite cool indeed, but is it really a useful 
metaphor for those who never used a strong microscope? :)


I think in the trivial sense "Far Edge" is a tautology, and should be 
avoided. As a weak proof of my words, I already see a lot of smart 
people confusing these two and actually use Central/Edge where they mean 
Edge/Far Edge. I suggest we adopt a different terminology, even if it 
less consistent with typical marketing term around the "Edge" movement.


Now, I don't have really great suggestions. Something that came up in 
TripleO discussions [1] is Core/Hub/Edge, which I think reflects the 
idea better.


I'd be very interested to hear your ideas.


"The Edge" and "Lunatic Fringe".

There, problem solved.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] shall we do a spec review day next tuesday oct 23?

2018-10-15 Thread Jay Pipes
Works for me too, thanks Melanie.

On Mon, Oct 15, 2018, 10:07 AM melanie witt  wrote:

> Hey all,
>
> Milestone s-1 is coming up next week on Thursday Oct 25 [1] and I was
> thinking it would be a good idea to have a spec review day next week on
> Tuesday Oct 23 to spend some focus on spec reviews together.
>
> Spec freeze is s-2 Jan 10, so the review day isn't related to any
> deadlines, but would just be a way to organize and make sure we have
> initial review on the specs that have been proposed so far.
>
> How does Tuesday Oct 23 work for everyone? Let me know if another day
> works better.
>
> So far, efried and mriedem are on board when I asked in the
> #openstack-nova channel. I'm sending this mail to gather more responses
> asynchronously.
>
> Cheers,
> -melanie
>
> [1] https://wiki.openstack.org/wiki/Nova/Stein_Release_Schedule
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla][tc] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-12 Thread Jay Pipes

On 10/11/2018 01:08 PM, Zane Bitter wrote:

On 10/10/18 1:35 PM, Jay Pipes wrote:

+tc topic

On 10/10/2018 11:49 AM, Fox, Kevin M wrote:
Sorry. Couldn't quite think of the name. I was meaning, openstack 
project tags.


I think having a tag that indicates the project is no longer using 
SELECT FOR UPDATE (and thus is safe to use multi-writer Galera) is an 
excellent idea, Kevin. ++


I would support such a tag, especially if it came with detailed 
instructions on how to audit your code to make sure you are not doing 
this with sqlalchemy. (Bonus points for a flake8 plugin that can be 
enabled in the gate.)


I can contribute to such a tag's documentation, but I don't currently 
have the bandwidth to start and shepherd it.


(One question for clarification: is this actually _required_ to use 
multi-writer Galera? My previous recollection was that it was possible, 
but inefficient, to use SELECT FOR UPDATE safely as long as you wrote a 
lot of boilerplate to restart the transaction if it failed.)


Certainly not. There is just a higher occurrence of the deadlock error 
in question when using SELECT FOR UPDATE versus using a compare-and-swap 
technique that does things like this:


UPDATE tbl SET field = value, generation = generation + 1
WHERE generation = $expected_generation;

The vast majority of cases I've seen where the deadlock occurred were 
during Rally tests, which were just brute-forcing breakage points and 
not particularly reflecting a real-world usage pattern.


So, in short, yes, it's perfectly safe and fine to use Galera in a 
multi-writer setup from the get-go with most OpenStack projects. It's 
just that *some* OpenStack projects of later releases have fewer code 
areas that aggravate the aforementioned deadlock conditions with Galera 
in multi-writer mode.


Best,
-jay


-jay



From: Jay Pipes [jaypi...@gmail.com]
Sent: Tuesday, October 09, 2018 12:22 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [kolla] add service discovery, proxysql, 
vault, fabio and FQDN endpoints


On 10/09/2018 03:10 PM, Fox, Kevin M wrote:
Oh, this does raise an interesting question... Should such 
information be reported by the projects up to users through labels? 
Something like, "percona_multimaster=safe" Its really difficult for 
folks to know which projects can and can not be used that way 
currently.


Are you referring to k8s labels/selectors? or are you referring to
project tags (you know, part of that whole Big Tent thing...)?

-jay

__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][taskflow] Thoughts on moving taskflow out of openstack/oslo

2018-10-10 Thread Jay Pipes

On 10/10/2018 01:41 PM, Greg Hill wrote:
I've been out of the openstack loop for a few years, so I hope this 
reaches the right folks.


Josh Harlow (original author of taskflow and related libraries) and I 
have been discussing the option of moving taskflow out of the openstack 
umbrella recently. This move would likely also include the futurist and 
automaton libraries that are primarily used by taskflow. The idea would 
be to just host them on github and use the regular Github features for 
Issues, PRs, wiki, etc, in the hopes that this would spur more 
development. Taskflow hasn't had any substantial contributions in 
several years and it doesn't really seem that the current openstack devs 
have a vested interest in moving it forward. I would like to move it 
forward, but I don't have an interest in being bound by the openstack 
workflow (this is why the project stagnated as core reviewers were 
pulled on to other projects and couldn't keep up with the review 
backlog, so contributions ground to a halt).


I'm not sure how using pull requests instead of Gerrit changesets would 
help "core reviewers being pulled on to other projects"?


Is this just about preferring not having a non-human gatekeeper like 
Gerrit+Zuul and being able to just have a couple people merge whatever 
they want to the master HEAD without needing to talk about +2/+W rights?


If it's just about preferring the pull request workflow versus the 
Gerrit rebase workflow, just say so. Same for just preferring the Github 
UI versus Gerrit's UI (which I agree is awful).


Anyway, it's cool with me to "free" taskflow from the onerous yoke of 
OpenStack development if that's what the contributors to it want.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla][tc] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-10 Thread Jay Pipes

+tc topic

On 10/10/2018 11:49 AM, Fox, Kevin M wrote:

Sorry. Couldn't quite think of the name. I was meaning, openstack project tags.


I think having a tag that indicates the project is no longer using 
SELECT FOR UPDATE (and thus is safe to use multi-writer Galera) is an 
excellent idea, Kevin. ++


-jay



From: Jay Pipes [jaypi...@gmail.com]
Sent: Tuesday, October 09, 2018 12:22 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, 
fabio and FQDN endpoints

On 10/09/2018 03:10 PM, Fox, Kevin M wrote:

Oh, this does raise an interesting question... Should such information be reported by the 
projects up to users through labels? Something like, "percona_multimaster=safe" 
Its really difficult for folks to know which projects can and can not be used that way 
currently.


Are you referring to k8s labels/selectors? or are you referring to
project tags (you know, part of that whole Big Tent thing...)?

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [python3] Enabling py37 unit tests

2018-10-10 Thread Jay Pipes

On 10/10/2018 09:42 AM, Corey Bryant wrote:
On Wed, Oct 10, 2018 at 9:26 AM Andreas Jaeger > wrote:


On 10/10/2018 14.45, Corey Bryant wrote:
 > [...]
 > == Enabling py37 unit tests ==
 >
 > Ubuntu Bionic (18.04 LTS) has the 3.7.0 interpreter and I have
reviews
 > up to define the py37 zuul job and templates here:
 > https://review.openstack.org/#/c/609066
 >
 > I'd like to start submitting reviews to projects to enable
 > openstack-python37-jobs (or variant) for projects that already have
 > openstack-python36-jobs in their .zuul.yaml, zuul.yaml,
 > .zuul.d/project.yaml.

We have projects testing python 3.5 and 3.6 already. Adding 3.7 to
it is
a lot of wasted VMs. Can we limit testing and not test all three,
please?

Well, I wouldn't call any of them wasted if they're testing against a 
supported Python version.


++

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-10 Thread Jay Pipes

On 10/10/2018 06:32 AM, Balázs Gibizer wrote:

Hi,

Thanks for all the feedback. I feel the following consensus is forming:

1) remove the force flag in a new microversion. I've proposed a spec
about that API change [1]


+1


2) in the old microversions change the blind allocation copy to gather
every resource from a nested source RPs too and try to allocate that
from the destination root RP. In nested allocation cases putting this
allocation to placement will fail and nova will fail the migration /
evacuation. However it will succeed if the server does not need nested
allocation neither on the source nor on the destination host (a.k.a the
legacy case). Or if the server has nested allocation on the source host
but does not need nested allocation on the destination host (for
example the dest host does not have nested RP tree yet).


I disagree on this. I'd rather just do a simple check for >1 provider in 
the allocations on the source and if True, fail hard.


The reverse (going from a non-nested source to a nested destination) 
will hard fail anyway on the destination because the POST /allocations 
won't work due to capacity exceeded (or failure to have any inventory at 
all for certain resource classes on the destination's root compute node).


-jay


I will start implementing #2) as part of the
use-nested-allocation-candidate bp soon and will continue with #1)
later in the cycle.

Nothing is set in stone yet so feedback is still very appreciated.

Cheers,
gibi

[1] https://review.openstack.org/#/c/609330/

On Tue, Oct 9, 2018 at 11:40 AM, Balázs Gibizer
 wrote:

Hi,

Setup
-

nested allocation: an allocation that contains resources from one or
more nested RPs. (if you have better term for this then please
suggest).

If an instance has nested allocation it means that the compute, it
allocates from, has a nested RP tree. BUT if a compute has a nested
RP tree it does not automatically means that the instance, allocating
from that compute, has a nested allocation (e.g. bandwidth inventory
will be on a nested RPs but not every instance will require bandwidth)

Afaiu, as soon as we have NUMA modelling in place the most trivial
servers will have nested allocations as CPU and MEMORY inverntory
will be moved to the nested NUMA RPs. But NUMA is still in the future.

Sidenote: there is an edge case reported by bauzas when an instance
allocates _only_ from nested RPs. This was discussed on last Friday
and it resulted in a new patch[0] but I would like to keep that
discussion separate from this if possible.

Sidenote: the current problem somewhat related to not just nested PRs
but to sharing RPs as well. However I'm not aiming to implement
sharing support in Nova right now so I also try to keep the sharing
disscussion separated if possible.

There was already some discussion on the Monday's scheduler meeting
but I could not attend.
http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20


The meat


Both live-migrate[1] and evacuate[2] has an optional force flag on
the nova REST API. The documentation says: "Force  by not
verifying the provided destination host by the scheduler."

Nova implements this statement by not calling the scheduler if
force=True BUT still try to manage allocations in placement.

To have allocation on the destination host Nova blindly copies the
instance allocation from the source host to the destination host
during these operations. Nova can do that as 1) the whole allocation
is against a single RP (the compute RP) and 2) Nova knows both the
source compute RP and the destination compute RP.

However as soon as we bring nested allocations into the picture that
blind copy will not be feasible. Possible cases
0) The instance has non-nested allocation on the source and would
need non nested allocation on the destination. This works with blindy
copy today.
1) The instance has a nested allocation on the source and would need
a nested allocation on the destination as well.
2) The instance has a non-nested allocation on the source and would
need a nested allocation on the destination.
3) The instance has a nested allocation on the source and would need
a non nested allocation on the destination.

Nova cannot generate nested allocations easily without reimplementing
some of the placement allocation candidate (a_c) code. However I
don't like the idea of duplicating some of the a_c code in Nova.

Nova cannot detect what kind of allocation (nested or non-nested) an
instance would need on the destination without calling placement a_c.
So knowing when to call placement is a chicken and egg problem.

Possible solutions:
A) fail fast

0) Nova can detect that the source allocatioin is non-nested and try
the blindy copy and it will succeed.
1) Nova can detect that the source allocaton is nested and fail the
operation
2) Nova only sees a non nested source allocation. Even if the dest RP
tree is nested it does not mean that the 

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-10 Thread Jay Pipes

On 10/09/2018 05:01 PM, Eric Fried wrote:

On 10/09/2018 02:20 PM, Jay Pipes wrote:

On 10/09/2018 11:04 AM, Balázs Gibizer wrote:

If you do the force flag removal in a nw microversion that also means
(at least to me) that you should not change the behavior of the force
flag in the old microversions.


Agreed.

Keep the old, buggy and unsafe behaviour for the old microversion and in
a new microversion remove the --force flag entirely and always call GET
/a_c, followed by a claim_resources() on the destination host.

For the old microversion behaviour, continue to do the "blind copy" of
allocations from the source compute node provider to the destination
compute node provider.


TBC, for nested/sharing source, we should consolidate all the resources
into a single allocation against the destination's root provider?


No. If there's >1 provider in the allocation for the source, just fail.


That "blind copy" will still fail if there isn't
capacity for the new allocations on the destination host anyway, because
the blind copy is just issuing a POST /allocations, and that code path
still checks capacity on the target resource providers.


What happens when the migration fails, either because of that POST
/allocations, or afterwards? Do we still have the old allocation around
to restore? Cause we can't re-figure it from the now-monolithic
destination allocation.


Again, just hard fail if there's >1 provider in the allocation on the 
source.



There isn't a
code path in the placement API that allows a provider's inventory
capacity to be exceeded by new allocations.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Jay Pipes

On 10/09/2018 03:10 PM, Fox, Kevin M wrote:

Oh, this does raise an interesting question... Should such information be reported by the 
projects up to users through labels? Something like, "percona_multimaster=safe" 
Its really difficult for folks to know which projects can and can not be used that way 
currently.


Are you referring to k8s labels/selectors? or are you referring to 
project tags (you know, part of that whole Big Tent thing...)?


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-09 Thread Jay Pipes

On 10/09/2018 11:04 AM, Balázs Gibizer wrote:

If you do the force flag removal in a nw microversion that also means
(at least to me) that you should not change the behavior of the force
flag in the old microversions.


Agreed.

Keep the old, buggy and unsafe behaviour for the old microversion and in 
a new microversion remove the --force flag entirely and always call GET 
/a_c, followed by a claim_resources() on the destination host.


For the old microversion behaviour, continue to do the "blind copy" of 
allocations from the source compute node provider to the destination 
compute node provider. That "blind copy" will still fail if there isn't 
capacity for the new allocations on the destination host anyway, because 
the blind copy is just issuing a POST /allocations, and that code path 
still checks capacity on the target resource providers. There isn't a 
code path in the placement API that allows a provider's inventory 
capacity to be exceeded by new allocations.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Jay Pipes

On 10/09/2018 06:34 AM, Florian Engelmann wrote:

Am 10/9/18 um 11:41 AM schrieb Jay Pipes:

On 10/09/2018 04:34 AM, Christian Berendt wrote:




On 8. Oct 2018, at 19:48, Jay Pipes  wrote:

Why not send all read and all write traffic to a single haproxy 
endpoint and just have haproxy spread all traffic across each Galera 
node?


Galera, after all, is multi-master synchronous replication... so it 
shouldn't matter which node in the Galera cluster you send traffic to.


Probably because of MySQL deadlocks in Galera:

—snip—
Galera cluster has known limitations, one of them is that it uses 
cluster-wide optimistic locking. This may cause some transactions to 
rollback. With an increasing number of writeable masters, the 
transaction rollback rate may increase, especially if there is write 
contention on the same dataset. It is of course possible to retry the 
transaction and perhaps it will COMMIT in the retries, but this will 
add to the transaction latency. However, some designs are deadlock 
prone, e.g sequence tables.

—snap—

Source: 
https://severalnines.com/resources/tutorials/mysql-load-balancing-haproxy-tutorial 



Have you seen the above in production?


Yes of course. Just depends on the application and how high the workload 
gets.


Please read about deadloks and nova in the following report by Intel:

http://galeracluster.com/wp-content/uploads/2017/06/performance_analysis_and_tuning_in_china_mobiles_openstack_production_cloud_2.pdf 


I have read the above. It's a synthetic workload analysis, which is why 
I asked if you'd seen this in production.


For the record, we addressed much of the contention/races mentioned in 
the above around scheduler resource consumption in the Ocata and Pike 
releases of Nova.


I'm aware that the report above identifies the quota handling code in 
Nova as the primary culprit of the deadlock issues but again, it's a 
synthetic workload that is designed to find breaking points. It doesn't 
represent a realistic production workload.


You can read about the deadlock issue in depth on my blog here:

http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/

That explains where the source of the problem comes from (it's the use 
of SELECT FOR UPDATE, which has been removed from Nova's quota-handling 
code in the Rocky release).


If just Nova is affected we could also create an additional HAProxy 
listener using all Galera nodes with round-robin for all other services?


I fail to see the point of using Galera with a single writer. At that 
point, why bother with Galera at all? Just use a single database node 
with a single slave for backup purposes.



Anyway - proxySQL would be a great extension.


I don't disagree that proxySQL is a good extension. However, it adds yet 
another services to the mesh that needs to be deployed, configured and 
maintained.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-09 Thread Jay Pipes

On 10/09/2018 04:34 AM, Christian Berendt wrote:




On 8. Oct 2018, at 19:48, Jay Pipes  wrote:

Why not send all read and all write traffic to a single haproxy endpoint and 
just have haproxy spread all traffic across each Galera node?

Galera, after all, is multi-master synchronous replication... so it shouldn't 
matter which node in the Galera cluster you send traffic to.


Probably because of MySQL deadlocks in Galera:

—snip—
Galera cluster has known limitations, one of them is that it uses cluster-wide 
optimistic locking. This may cause some transactions to rollback. With an 
increasing number of writeable masters, the transaction rollback rate may 
increase, especially if there is write contention on the same dataset. It is of 
course possible to retry the transaction and perhaps it will COMMIT in the 
retries, but this will add to the transaction latency. However, some designs 
are deadlock prone, e.g sequence tables.
—snap—

Source: 
https://severalnines.com/resources/tutorials/mysql-load-balancing-haproxy-tutorial


Have you seen the above in production?

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

2018-10-08 Thread Jay Pipes

On 10/08/2018 06:14 AM, Florian Engelmann wrote:
3. HAProxy is not capable to handle "read/write" split with Galera. I 
would like to introduce ProxySQL to be able to scale Galera.


Why not send all read and all write traffic to a single haproxy endpoint 
and just have haproxy spread all traffic across each Galera node?


Galera, after all, is multi-master synchronous replication... so it 
shouldn't matter which node in the Galera cluster you send traffic to.


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [ironic] agreement on how to specify options that impact scheduling and configuration

2018-10-05 Thread Jay Pipes

Added [ironic] topic.

On 10/04/2018 06:06 PM, Chris Friesen wrote:
While discussing the "Add HPET timer support for x86 guests" 
blueprint[1] one of the items that came up was how to represent what are 
essentially flags that impact both scheduling and configuration.  Eric 
Fried posted a spec to start a discussion[2], and a number of nova 
developers met on a hangout to hash it out.  This is the result.


In this specific scenario the goal was to allow the user to specify that 
their image required a virtual HPET.  For efficient scheduling we wanted 
this to map to a placement trait, and the virt driver also needed to 
enable the feature when booting the instance.  (This can be generalized 
to other similar problems, including how to specify scheduling and 
configuration information for Ironic.)


We discussed two primary approaches:

The first approach was to specify an arbitrary "key=val" in flavor 
extra-specs or image properties, which nova would automatically 
translate into the appropriate placement trait before passing it to 
placement.  Once scheduled to a compute node, the virt driver would look 
for "key=val" in the flavor/image to determine how to proceed.


The second approach was to directly specify the placement trait in the 
flavor extra-specs or image properties.  Once scheduled to a compute 
node, the virt driver would look for the placement trait in the 
flavor/image to determine how to proceed.


Ultimately, the decision was made to go with the second approach.  The 
result is that it is officially acceptable for virt drivers to key off 
placement traits specified in the image/flavor in order to turn on/off 
configuration options for the instance.  If we do get down to the virt 
driver and the trait is set, and the driver for whatever reason 
determines it's not capable of flipping the switch, it should fail.


Ironicers, pay attention to the above! :) It's a green light from Nova 
to use the traits list contained in the flavor extra specs and image 
metadata when (pre-)configuring an instance.


It should be noted that it only makes sense to use placement traits for 
things that affect scheduling.  If it doesn't affect scheduling, then it 
can be stored in the flavor extra-specs or image properties separate 
from the placement traits.  Also, this approach only makes sense for 
simple booleans.  Anything requiring more complex configuration will 
likely need additional extra-spec and/or config and/or unicorn dust.


Ironicers, also pay close attention to the advice above. Things that are 
not "scheduleable" -- in other words, things that don't filter the list 
of hosts that a workload can land on -- should not go in traits.


Finally, here's the HPET os-traits patch. Reviews welcome (it's tiny patch):

https://review.openstack.org/608258

Best,
-jay


Chris

[1] https://blueprints.launchpad.net/nova/+spec/support-hpet-on-guest
[2] 
https://review.openstack.org/#/c/607989/1/specs/stein/approved/support-hpet-on-guest.rst 



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] Tenks

2018-10-02 Thread Jay Pipes

On 10/02/2018 08:58 AM, Mark Goddard wrote:

Hi,

In the most recent Ironic meeting we discussed [1] tenks, and the 
possibility of adding the project under Ironic governance. We agreed to 
move the discussion to the mailing list. I'll introduce the project here 
and give everyone a chance to ask questions. If things appear to move in 
the right direction, I'll propose a vote for inclusion under Ironic's 
governance.


Tenks is a project for managing 'virtual bare metal clusters'. It aims 
to be a drop-in replacement for the various scripts and templates that 
exist in the Ironic devstack plugin for creating VMs to act as bare 
metal nodes in development and test environments. Similar code exists in 
Bifrost and TripleO, and probably other places too. By focusing on one 
project, we can ensure that it works well, and provides all the features 
necessary as support for bare metal in the cloud evolves.


That's tenks the concept. Tenks in reality today is a working version 
1.0, written in Ansible, built by Will Miller (w-miller) during his 
summer placement. Will has returned to his studies, and Will Szumski 
(jovial) has picked it up. You don't have to be called Will to work on 
Tenks, but it helps.


There are various resources available for anyone wishing to find out more:

* Ironic spec review: https://review.openstack.org/#/c/579583
* Documentation: https://tenks.readthedocs.io/en/latest/
* Source code: https://github.com/stackhpc/tenks
* Blog: https://stackhpc.com/tenks.html
* IRC: mgoddard or jovial in #openstack-ironic

What does everyone think? Is this something that the ironic community 
could or should take ownership of?


How does Tenks relate to OVB?

https://openstack-virtual-baremetal.readthedocs.io/en/latest/introduction.html

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-10-01 Thread Jay Pipes

On 10/01/2018 06:04 PM, Julia Kreger wrote:

On Mon, Oct 1, 2018 at 2:41 PM Eric Fried  wrote:


 > So say the user requests a node that supports UEFI because their
image
 > needs UEFI. Which workflow would you want here?
 >
 > 1) The operator (or ironic?) has already configured the node to
boot in
 > UEFI mode. Only pre-configured nodes advertise the "supports
UEFI" trait.
 >
 > 2) Any node that supports UEFI mode advertises the trait. Ironic
ensures
 > that UEFI mode is enabled before provisioning the machine.
 >
 > I imagine doing #2 by passing the traits which were specifically
 > requested by the user, from Nova to Ironic, so that Ironic can do the
 > right thing for the user.
 >
 > Your proposal suggests that the user request the "supports UEFI"
trait,
 > and *also* pass some glance UUID which the user understands will make
 > sure the node actually boots in UEFI mode. Something like:
 >
 > openstack server create --flavor METAL_12CPU_128G --trait
SUPPORTS_UEFI
 > --config-data $TURN_ON_UEFI_UUID
 >
 > Note that I pass --trait because I hope that will one day be
supported
 > and we can slow down the flavor explosion.

IMO --trait would be making things worse (but see below). I think UEFI
with Jay's model would be more like:

   openstack server create --flavor METAL_12CPU_128G --config-data $UEFI

where the UEFI profile would be pretty trivial, consisting of
placement.traits.required = ["BOOT_MODE_UEFI"] and object.boot_mode =
"uefi".

I agree that this seems kind of heavy, and that it would be nice to be
able to say "boot mode is UEFI" just once. OTOH I get Jay's point that
we need to separate the placement decision from the instance
configuration.

That said, what if it was:

  openstack config-profile create --name BOOT_MODE_UEFI --json -
  {
   "type": "boot_mode_scheme",
   "version": 123,
   "object": {
       "boot_mode": "uefi"
   },
   "placement": {
    "traits": {
     "required": [
      "BOOT_MODE_UEFI"
     ]
    }
   }
  }
  ^D

And now you could in fact say

  openstack server create --flavor foo --config-profile BOOT_MODE_UEFI

using the profile name, which happens to be the same as the trait name
because you made it so. Does that satisfy the yen for saying it once? (I
mean, despite the fact that you first had to say it three times to get
it set up.)



I do want to zoom out a bit and point out that we're talking about
implementing a new framework of substantial size and impact when the
original proposal - using the trait for both - would just work out of
the box today with no changes in either API. Is it really worth it?


+1000. Reading both of these threads, it feels like we're basically 
trying to make something perfect. I think that is a fine goal, except it 
is unrealistic because the enemy of good is perfection.




By the way, with Jim's --trait suggestion, this:

 > ...dozens of flavors that look like this:
 > - 12CPU_128G_RAID10_DRIVE_LAYOUT_X
 > - 12CPU_128G_RAID5_DRIVE_LAYOUT_X
 > - 12CPU_128G_RAID01_DRIVE_LAYOUT_X
 > - 12CPU_128G_RAID10_DRIVE_LAYOUT_Y
 > - 12CPU_128G_RAID5_DRIVE_LAYOUT_Y
 > - 12CPU_128G_RAID01_DRIVE_LAYOUT_Y

...could actually become:

  openstack server create --flavor 12CPU_128G --trait $WHICH_RAID
--trait
$WHICH_LAYOUT

No flavor explosion.


++ I believe this was where this discussion kind of ended up in.. ?Dublin?

The desire and discussion that led us into complex configuration 
templates and profiles being submitted were for highly complex scenarios 
where users wanted to assert detailed raid configurations to disk. 
Naturally, there are many issues there. The ability to provide such 
detail would be awesome for those 10% of operators that need such 
functionality. Of course, if that is the only path forward, then we 
delay the 90% from getting the minimum viable feature they need.



(Maybe if we called it something other than --trait, like maybe
--config-option, it would let us pretend we're not really overloading a
trait to do config - it's just a coincidence that the config option has
the same name as the trait it causes to be required.)


I feel like it might be confusing, but totally +1 to matching required 
trait name being a thing. That way scheduling is completely decoupled 
and if everything was correct then the request should already be 
scheduled properly.


I guess I'll just drop the idea of doing this properly then. It's true 
that the placement traits concept can be hacked up and the virt driver 
can just pass a list of trait strings to the Ironic API and that's the 
most expedient way to get what the 90% of people apparently want. It's 
also true that it will add a bunch of 

Re: [openstack-dev] [placement] The "intended purpose" of traits

2018-10-01 Thread Jay Pipes

On 10/01/2018 01:20 PM, Eric Fried wrote:

I agree that we should not overload placement as a mechanism to pass
configuration information ("set up RAID5 on my storage, please") to the
driver. So let's put that aside. (Looking forward to that spec.)


ack.


I still want to use something like "Is capable of RAID5" and/or "Has
RAID5 already configured" as part of a scheduling and placement
decision. Being able to have the GET /a_c response filtered down to
providers with those, ahem, traits is the exact purpose of that operation.


And yep, I have zero problem with this either, as I've noted. This is 
precisely what placement and traits were designed for.



While we're in the neighborhood, we agreed in Denver to use a trait to
indicate which service "owns" a provider [1], so we can eventually
coordinate a smooth handoff of e.g. a device provider from nova to
cyborg. This is certainly not a capability (but it is a trait), and it
can certainly be construed as a key/value (owning_service=cyborg). Are
we rescinding that decision?


Unfortunately I have zero recollection of a conversation about using 
traits for indicating who "owns" a provider. :(


I don't think I would support such a thing -- rather, I would support 
adding an attribute to the provider model itself for an owning service 
or such thing.


That's a great example of where the attribute has specific conceptual 
meaning to placement (the concept of ownership) and should definitely 
not be tucked away, encoded into a trait string.


OK, I'll get back to that spec now... :)

Best,
-jay


[1] https://review.openstack.org/#/c/602160/


I'm working on a spec that will describe a way for the user to instruct
Nova to pass configuration data to the virt driver (or device manager)
before instance spawn. This will have nothing to do with placement or
traits, since this configuration data is not modeling scheduling and
placement decisions.

I hope to have that spec done by Monday so we can discuss on the spec.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-10-01 Thread Jay Pipes

On 10/01/2018 09:01 AM, Jim Rollenhagen wrote:
On Mon, Oct 1, 2018 at 8:03 AM Jay Pipes <mailto:jaypi...@gmail.com>> wrote:


On 10/01/2018 04:36 AM, John Garbutt wrote:
 > On Fri, 28 Sep 2018 at 00:46, Jay Pipes mailto:jaypi...@gmail.com>
 > <mailto:jaypi...@gmail.com <mailto:jaypi...@gmail.com>>> wrote:
 >
 >     On 09/27/2018 06:23 PM, Matt Riedemann wrote:
 >      > On 9/27/2018 3:02 PM, Jay Pipes wrote:
 >      >> A great example of this would be the proposed "deploy
template"
 >     from
 >      >> [2]. This is nothing more than abusing the placement
traits API in
 >      >> order to allow passthrough of instance configuration data
from the
 >      >> nova flavor extra spec directly into the nodes.instance_info
 >     field in
 >      >> the Ironic database. It's a hack that is abusing the entire
 >     concept of
 >      >> the placement traits concept, IMHO.
 >      >>
 >      >> We should have a way *in Nova* of allowing instance
configuration
 >      >> key/value information to be passed through to the virt
driver's
 >      >> spawn() method, much the same way we provide for
user_data that
 >     gets
 >      >> exposed after boot to the guest instance via configdrive
or the
 >      >> metadata service API. What this deploy template thing is
is just a
 >      >> hack to get around the fact that nova doesn't have a
basic way of
 >      >> passing through some collated instance configuration
key/value
 >      >> information, which is a darn shame and I'm really kind of
 >     annoyed with
 >      >> myself for not noticing this sooner. :(
 >      >
 >      > We talked about this in Dublin through right? We said a good
 >     thing to do
 >      > would be to have some kind of template/profile/config/whatever
 >     stored
 >      > off in glare where schema could be registered on that
thing, and
 >     then
 >      > you pass a handle (ID reference) to that to nova when
creating the
 >      > (baremetal) server, nova pulls it down from glare and hands it
 >     off to
 >      > the virt driver. It's just that no one is doing that work.
 >
 >     No, nobody is doing that work.
 >
 >     I will if need be if it means not hacking the placement API
to serve
 >     this purpose (for which it wasn't intended).
 >
 >
 > Going back to the point Mark Goddard made, there are two things here:
 >
 > 1) Picking the correct resource provider
 > 2) Telling Ironic to transform the picked node in some way
 >
 > Today we allow the use Capabilities for both.
 >
 > I am suggesting we move to using Traits only for (1), leaving (2) in
 > place for now, while we decide what to do (i.e. future of "deploy
 > template" concept).
 >
 > It feels like Ironic's plan to define the "deploy templates" in
Ironic
 > should replace the dependency on Glare for this use case, largely
 > because the definition of the deploy template (in my mind) is very
 > heavily related to inspector and driver properties, etc. Mark is
looking
 > at moving that forward at the moment.

That won't do anything about the flavor explosion problem, though,
right
John?


Does nova still plan to allow passing additional desired traits into the 
server create request?

I (we?) was kind of banking on that to solve the Baskin Robbins thing.


That's precisely what I've been looking into. From what I can tell, 
Ironic was planning on using these CUSTOM_DEPLOY_TEMPLATE_XXX traits in 
two ways:


1) To tell Nova what scheduling constraints the instance needed -- e.g. 
"hey Nova, make sure I land on a node that supports UEFI boot mode 
because my boot image relies on that".


2) As a convenient (because it would require no changes to Nova) way of 
passing instance pre-spawn configuration data to the Ironic virt driver 
-- e.g. pass the entire set of traits that are in the RequestSpec's 
flavor and image extra specs to Ironic before calling the Ironic node 
provision API.


#1 is fine IMHO, since it (mostly) represents a "capability" that the 
resource provider (the Ironic baremetal node) must support in order for 
the instance to successfully boot.


#2 is a problem, though, because it *doesn't* represent a capability. In 
fact, it can represent any and all sorts of key/value, JSON/dict or 
other information and this information is not intended to be passed to 
the placement/scheduler

Re: [openstack-dev] [Openstack-operators] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-10-01 Thread Jay Pipes

On 10/01/2018 04:36 AM, John Garbutt wrote:
On Fri, 28 Sep 2018 at 00:46, Jay Pipes <mailto:jaypi...@gmail.com>> wrote:


On 09/27/2018 06:23 PM, Matt Riedemann wrote:
 > On 9/27/2018 3:02 PM, Jay Pipes wrote:
 >> A great example of this would be the proposed "deploy template"
from
 >> [2]. This is nothing more than abusing the placement traits API in
 >> order to allow passthrough of instance configuration data from the
 >> nova flavor extra spec directly into the nodes.instance_info
field in
 >> the Ironic database. It's a hack that is abusing the entire
concept of
 >> the placement traits concept, IMHO.
 >>
 >> We should have a way *in Nova* of allowing instance configuration
 >> key/value information to be passed through to the virt driver's
 >> spawn() method, much the same way we provide for user_data that
gets
 >> exposed after boot to the guest instance via configdrive or the
 >> metadata service API. What this deploy template thing is is just a
 >> hack to get around the fact that nova doesn't have a basic way of
 >> passing through some collated instance configuration key/value
 >> information, which is a darn shame and I'm really kind of
annoyed with
 >> myself for not noticing this sooner. :(
 >
 > We talked about this in Dublin through right? We said a good
thing to do
 > would be to have some kind of template/profile/config/whatever
stored
 > off in glare where schema could be registered on that thing, and
then
 > you pass a handle (ID reference) to that to nova when creating the
 > (baremetal) server, nova pulls it down from glare and hands it
off to
 > the virt driver. It's just that no one is doing that work.

No, nobody is doing that work.

I will if need be if it means not hacking the placement API to serve
this purpose (for which it wasn't intended).


Going back to the point Mark Goddard made, there are two things here:

1) Picking the correct resource provider
2) Telling Ironic to transform the picked node in some way

Today we allow the use Capabilities for both.

I am suggesting we move to using Traits only for (1), leaving (2) in 
place for now, while we decide what to do (i.e. future of "deploy 
template" concept).


It feels like Ironic's plan to define the "deploy templates" in Ironic 
should replace the dependency on Glare for this use case, largely 
because the definition of the deploy template (in my mind) is very 
heavily related to inspector and driver properties, etc. Mark is looking 
at moving that forward at the moment.


That won't do anything about the flavor explosion problem, though, right 
John?


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] The "intended purpose" of traits

2018-09-29 Thread Jay Pipes

On 09/28/2018 04:36 PM, Eric Fried wrote:

So here it is. Two of the top influencers in placement, one saying we
shouldn't overload traits, the other saying we shouldn't add a primitive
that would obviate the need for that. Historically, this kind of
disagreement seems to result in an impasse: neither thing happens and
those who would benefit are forced to find a workaround or punt.
Frankly, I don't particularly care which way we go; I just want to be
able to do the things.


I don't think that's a fair statement. You absolutely *do* care which 
way we go. You want to encode multiple bits of information into a trait 
string -- such as "PCI_ADDRESS_01_AB_23_CD" -- and leave it up to the 
caller to have to understand that this trait string has multiple bits of 
information encoded in it (the fact that it's a PCI device and that the 
PCI device is at 01_AB_23_CD).


You don't see a problem encoding these variants inside a string. Chris 
doesn't either.


I *do* see a problem with it, based on my experience in Nova where this 
kind of thing leads to ugly, unmaintainable, and incomprehensible code 
as I have pointed to in previous responses.


Furthermore, your point isn't that "you just want to be able to do the 
things". Your point (and the point of others, from Cyborg and Ironic) is 
that you want to be able to use placement to pass various bits of 
information to an instance, and placement wasn't designed for that 
purpose. Nova was.


So, instead of working out a solution with the Nova team for passing 
configuration data about an instance, the proposed solution is instead 
to hack/encode multiple bits of information into a trait string. This 
proposed solution is seen as a way around having to work out a more 
appropriate solution that has Nova pass that configuration data (as is 
appropriate, since nova is the project that manages instances) to the 
virt driver or generic device manager (i.e. Cyborg) before the instance 
spawns.


I'm working on a spec that will describe a way for the user to instruct 
Nova to pass configuration data to the virt driver (or device manager) 
before instance spawn. This will have nothing to do with placement or 
traits, since this configuration data is not modeling scheduling and 
placement decisions.


I hope to have that spec done by Monday so we can discuss on the spec.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] The "intended purpose" of traits

2018-09-28 Thread Jay Pipes

On 09/28/2018 04:42 PM, Eric Fried wrote:

On 09/28/2018 09:41 AM, Balázs Gibizer wrote:

On Fri, Sep 28, 2018 at 3:25 PM, Eric Fried  wrote:

It's time somebody said this.

Every time we turn a corner or look under a rug, we find another use
case for provider traits in placement. But every time we have to have
the argument about whether that use case satisfies the original
"intended purpose" of traits.

That's only reason I've ever been able to glean: that it (whatever "it"
is) wasn't what the architects had in mind when they came up with the
idea of traits. We're not even talking about anything that would require
changes to the placement API. Just, "Oh, that's not a *capability* -
shut it down."

Bubble wrap was originally intended as a textured wallpaper and a
greenhouse insulator. Can we accept the fact that traits have (many,
many) uses beyond marking capabilities, and quit with the arbitrary
restrictions?


How far are we willing to go? Does an arbitrary (key: value) pair
encoded in a trait name like key_`str(value)` (e.g. CURRENT_TEMPERATURE:
85 encoded as CUSTOM_TEMPERATURE_85) something we would be OK to see in
placement?


Great question. Perhaps TEMPERATURE_DANGEROUSLY_HIGH is okay, but
TEMPERATURE_ is not.


That's correct, because you're encoding >1 piece of information into the 
single string (the fact that it's a temperature *and* the value of that 
temperature are the two pieces of information encoded into the single 
string).


Now that there's multiple pieces of information encoded in the string 
the reader of the trait string needs to know how to decode those bits of 
information, which is exactly what we're trying to avoid doing (because 
we can see from the ComputeCapabilitiesFilter, the extra_specs mess, and 
the giant hairball that is the NUMA and CPU pinning "metadata requests" 
how that turns out).



This thread isn't about setting these parameters; it's about getting
us to a point where we can discuss a question just like this one
without running up against: >
"That's a hard no, because you shouldn't encode key/value pairs in traits."

"Oh, why's that?"

"Because that's not what we intended when we created traits."

"But it would work, and the alternatives are way harder."

"-1"

"But..."

"-I


I believe I've articulated a number of times why traits should remain 
unary pieces of information, and not just said "because that's what we 
intended when we created traits".


I'm tough on this because I've seen the garbage code and unmaintainable 
mess that not having structurally sound data modeling concepts and 
information interpretation rules leads to in Nova and I don't want to 
encourage any more of it.


-jay


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] The "intended purpose" of traits

2018-09-28 Thread Jay Pipes

On 09/28/2018 09:25 AM, Eric Fried wrote:

It's time somebody said this.

Every time we turn a corner or look under a rug, we find another use
case for provider traits in placement. But every time we have to have
the argument about whether that use case satisfies the original
"intended purpose" of traits.

That's only reason I've ever been able to glean: that it (whatever "it"
is) wasn't what the architects had in mind when they came up with the
idea of traits.


Don't pussyfoot around things. It's me you're talking about, Eric. You 
could just ask me instead of passive-aggressively posting to the list 
like this.



We're not even talking about anything that would require changes to
the placement API. Just, "Oh, that's not a *capability* - shut it
down."
That's precisely the attitude that got the Nova scheduler into the 
unmaintainable and convoluted mess that it is now: "well, who cares if a 
concept was originally intended to describe X, it's just *easier* for us 
to re-use this random piece of data in ways it wasn't intended because 
that way we don't have to change anything about our docs or our API".


And *this* is the kind of stuff you end up with:

https://github.com/openstack/nova/blob/99bf62e42701397690fe2b4987ce4fd7879355b8/nova/scheduler/filters/compute_capabilities_filter.py#L35-L107

Which is a pile of unreadable, unintelligible garbage; nobody knows how 
it works, how it originally was intended to work, or how to really clean 
it up.



Bubble wrap was originally intended as a textured wallpaper and a
greenhouse insulator. Can we accept the fact that traits have (many,
many) uses beyond marking capabilities, and quit with the arbitrary
restrictions?


They aren't arbitrary. They are there for a reason: a trait is a boolean 
capability. It describes something that either a provider is capable of 
supporting or it isn't.


Conceptually, having boolean traits/capabilities is important because it 
allows the user to reason simply about how a provider meets the 
requested constraints for scheduling.


Currently, those constraints include the following:

* Does the provider have *capacity* for the requested resources?
* Does the provider have the required (or forbidden) *capabilities*?
* Does the provider belong to some group?

If we want to add further constraints to the placement allocation 
candidates request that ask things like:


* Does the provider have version 1.22.61821 of BIOS firmware from 
Marvell installed on it?
* Does the provider support an FPGA that has had an OVS program flashed 
to it in the last 20 days?
* Does the provider belong to physical network "corpnet" and also 
support creation of virtual NICs of type either "DIRECT" or "NORMAL"?


Then we should add a data model that allow providers to be decorated 
with key/value (or more complex than key/value) information where we can 
query for those kinds of constraints without needing to encode all sorts 
of non-binary bits of information into a capability string.


Propose such a thing and I'll gladly support it. But I won't support 
bastardizing the simple concept of a boolean capability just because we 
don't want to change the API or database schema.


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [ironic] [nova] [tripleo] Deprecation of Nova's integration with Ironic Capabilities and ComputeCapabilitiesFilter

2018-09-27 Thread Jay Pipes

On 09/27/2018 06:23 PM, Matt Riedemann wrote:

On 9/27/2018 3:02 PM, Jay Pipes wrote:
A great example of this would be the proposed "deploy template" from 
[2]. This is nothing more than abusing the placement traits API in 
order to allow passthrough of instance configuration data from the 
nova flavor extra spec directly into the nodes.instance_info field in 
the Ironic database. It's a hack that is abusing the entire concept of 
the placement traits concept, IMHO.


We should have a way *in Nova* of allowing instance configuration 
key/value information to be passed through to the virt driver's 
spawn() method, much the same way we provide for user_data that gets 
exposed after boot to the guest instance via configdrive or the 
metadata service API. What this deploy template thing is is just a 
hack to get around the fact that nova doesn't have a basic way of 
passing through some collated instance configuration key/value 
information, which is a darn shame and I'm really kind of annoyed with 
myself for not noticing this sooner. :(


We talked about this in Dublin through right? We said a good thing to do 
would be to have some kind of template/profile/config/whatever stored 
off in glare where schema could be registered on that thing, and then 
you pass a handle (ID reference) to that to nova when creating the 
(baremetal) server, nova pulls it down from glare and hands it off to 
the virt driver. It's just that no one is doing that work.


No, nobody is doing that work.

I will if need be if it means not hacking the placement API to serve 
this purpose (for which it wasn't intended).


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Stein PTG summary

2018-09-27 Thread Jay Pipes

On 09/27/2018 11:15 AM, Eric Fried wrote:

On 09/27/2018 07:37 AM, Matt Riedemann wrote:

On 9/27/2018 5:23 AM, Sylvain Bauza wrote:



On Thu, Sep 27, 2018 at 2:46 AM Matt Riedemann mailto:mriede...@gmail.com>> wrote:

     On 9/26/2018 5:30 PM, Sylvain Bauza wrote:
  > So, during this day, we also discussed about NUMA affinity and we
     said
  > that we could possibly use nested resource providers for NUMA
     cells in
  > Stein, but given we don't have yet a specific Placement API
     query, NUMA
  > affinity should still be using the NUMATopologyFilter.
  > That said, when looking about how to use this filter for vGPUs,
     it looks
  > to me that I'd need to provide a new version for the NUMACell
     object and
  > modify the virt.hardware module. Are we also accepting this
     (given it's
  > a temporary question), or should we need to wait for the
     Placement API
  > support ?
  >
  > Folks, what are you thoughts ?

     I'm pretty sure we've said several times already that modeling
NUMA in
     Placement is not something for which we're holding up the extraction.


It's not an extraction question. Just about knowing whether the Nova
folks would accept us to modify some o.vo object and module just for a
temporary time until Placement API has some new query parameter.
Whether Placement is extracted or not isn't really the problem, it's
more about the time it will take for this query parameter ("numbered
request groups to be in the same subtree") to be implemented in the
Placement API.
The real problem we have with vGPUs is that if we don't have NUMA
affinity, the performance would be around 10% less for vGPUs (if the
pGPU isn't on the same NUMA cell than the pCPU). Not sure large
operators would accept that :(

-Sylvain


I don't know how close we are to having whatever we need for modeling
NUMA in the placement API, but I'll go out on a limb and assume we're
not close.


True story. We've been talking about ways to do this since (at least)
the Queens PTG, but haven't even landed on a decent design, let alone
talked about getting it specced, prioritized, and implemented. Since
full NRP support was going to be a prerequisite in any case, and our
Stein plate is full, Train is the earliest we could reasonably expect to
get the placement support going, let alone the nova side. So yeah...


Given that, if we have to do something within nova for NUMA
affinity for vGPUs for the NUMATopologyFilter, then I'd be OK with that
since it's short term like you said (although our "short term"
workarounds tend to last for many releases). Anyone that cares about
NUMA today already has to enable the scheduler filter anyway.



+1 to this ^


Or, I don't know, maybe don't do anything and deal with the (maybe) 10% 
performance impact from the cross-NUMA main memory <-> CPU hit for 
post-processing of already parallel-processed GPU data.


In other words, like I've mentioned in numerous specs and in person, I 
really don't think this is a major problem and is mostly something we're 
making a big deal about for no real reason.


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][glance][ironic][keystone][neutron][nova][edge] PTG summary on edge discussions

2018-09-26 Thread Jay Pipes

On 09/26/2018 05:10 AM, Colleen Murphy wrote:

Thanks for the summary, Ildiko. I have some questions inline.

On Tue, Sep 25, 2018, at 11:23 AM, Ildiko Vancsa wrote:





We agreed to prefer federation for Keystone and came up with two work
items to cover missing functionality:

* Keystone to trust a token from an ID Provider master and when the auth
method is called, perform an idempotent creation of the user, project
and role assignments according to the assertions made in the token


This sounds like it is based on the customizations done at Oath, which to my 
recollection did not use the actual federation implementation in keystone due 
to its reliance on Athenz (I think?) as an identity manager. Something similar 
can be accomplished in standard keystone with the mapping API in keystone which 
can cause dynamic generation of a shadow user, project and role assignments.


* Keystone should support the creation of users and projects with
predictable UUIDs (eg.: hash of the name of the users and projects).
This greatly simplifies Image federation and telemetry gathering


I was in and out of the room and don't recall this discussion exactly. We have 
historically pushed back hard against allowing setting a project ID via the 
API, though I can see predictable-but-not-settable as less problematic. One of 
the use cases from the past was being able to use the same token in different 
regions, which is problematic from a security perspective. Is that that idea 
here? Or could someone provide more details on why this is needed?


Hi Colleen,

I wasn't in the room for this conversation either, but I believe the 
"use case" wanted here is mostly a convenience one. If the edge 
deployment is composed of hundreds of small Keystone installations and 
you have a user (e.g. an NFV MANO user) which should have visibility 
across all of those Keystone installations, it becomes a hassle to need 
to remember (or in the case of headless users, store some lookup of) all 
the different tenant and user UUIDs for what is essentially the same 
user across all of those Keystone installations.


I'd argue that as long as it's possible to create a Keystone tenant and 
user with a unique name within a deployment, and as long as it's 
possible to authenticate using the tenant and user *name* (i.e. not the 
UUID), then this isn't too big of a problem. However, I do know that a 
bunch of scripts and external tools rely on setting the tenant and/or 
user via the UUID values and not the names, so that might be where this 
feature request is coming from.


Hope that makes sense?

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [penstack-dev]Discussion about the future of OpenStack in China

2018-09-24 Thread Jay Pipes

Fred,

I had a hard time understanding the articles. I'm not sure if you used 
Google Translate to do the translation from Chinese to English, but I 
personally found both of them difficult to follow.


There were a couple points that I did manage to decipher, though. One 
thing that both articles seemed to say was that OpenStack doesn't meet 
public (AWS-ish) cloud use cases and OpenStack doesn't compare favorably 
to VMWare either.


Is there a large contingent of Chinese OpenStack users that expect 
OpenStack to be a free (as in beer) version of VMware technology?


What are the 3 most important features that Chinese OpenStack users 
would like to see included in OpenStack projects?


Thanks,
-jay

On 09/24/2018 11:10 AM, Fred Li wrote:

Hi folks,

Recently there are several blogs which discussed about the future of 
OpenStack. If I was not wrong, the first one is 
"OpenStack-8-year-itch"[1], and you can find its English version 
attached. Thanks to google translation. The second one is 
"5-years-my-opinion-on-OpenStack" [2] with English version attached as 
well. Please translate the 3 to 6 and read them if you are interested.


I don't want to judge anything here. I just want to share as they are 
quite hot discussion and I think it is valuable for the whole community, 
not part of community to know.


[1] https://mp.weixin.qq.com/s/GM5cMOl0q3hb_6_eEiixzA
[2] https://mp.weixin.qq.com/s/qZkE4o_BHBPlbIjekjDRKw
[3] https://mp.weixin.qq.com/s/svX4z3JM5ArQ57A1jFoyLw
[4] https://mp.weixin.qq.com/s/Nyb0OxI2Z7LxDpofTTyWOg
[5] https://mp.weixin.qq.com/s/5GV4i8kyedHSbCxCO1VBRw
[6] https://mp.weixin.qq.com/s/yeBcMogumXKGQ0KyKrgbqA
--
Regards
Fred Li (李永乐)


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic][edge] Notes from the PTG

2018-09-19 Thread Jay Pipes

On 09/19/2018 11:03 AM, Jim Rollenhagen wrote:
On Wed, Sep 19, 2018 at 8:49 AM, Jim Rollenhagen > wrote:


Tuesday: edge


Since cdent asked in IRC, when we talk about edge and far edge, we 
defined these roughly like this:

https://usercontent.irccloud-cdn.com/file/NunkkS2y/edge_architecture1.JPG


Far out, man.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Nominating Tetsuro Nakamura for placement-core

2018-09-19 Thread Jay Pipes

On 09/19/2018 11:25 AM, Chris Dent wrote:

I'd like to nominate Tetsuro Nakamura for membership in the
placement-core team. Throughout placement's development Tetsuro has
provided quality reviews; done the hard work of creating rigorous
functional tests, making them fail, and fixing them; and implemented
some of the complex functionality required at the persistence layer.
He's aware of and respects the overarching goals of placement and has
demonstrated pragmatism when balancing those goals against the
requirements of nova, blazar and other projects.

Please follow up with a +1/-1 to express your preference. No need to
be an existing placement core, everyone with an interest is welcome.


+1

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] notes from stein ptg meetings of the technical committee

2018-09-17 Thread Jay Pipes

On 09/17/2018 04:50 PM, Doug Hellmann wrote:

Excerpts from Zane Bitter's message of 2018-09-17 16:12:30 -0400:

On 17/09/18 3:06 PM, Jay Pipes wrote:

On 09/17/2018 01:31 PM, Doug Hellmann wrote:

New Project Application Process
===

We wrapped up Sunday with a discussion of of our process for reviewing
new project applications. Zane and Chris in particular felt the
process for Adjutant was too painful for the project team because
there was no way to know how long discussions might go on and now
way for them to anticipate some of the issues they encountered.

We talked about formalizing a "coach" position to have someone from
the TC (or broader community) work with the team to prepare their
application with sufficient detail, seek feedback before voting
starts, etc.

We also talked about adding a time limit to the process, so that
teams at least have a rejection with feedback in a reasonable amount
of time.  Some of the less contentious discussions have averaged
from 1-4 months with a few more contentious cases taking as long
as 10 months. We did not settle on a time frame during the meeting,
so I expect this to be a topic for us to work out during the next
term.


So, to summarize... the TC is back to almost exactly the same point it
was at right before the Project Structure Reform happened in 2014-2015
(that whole Big Tent thing).


I wouldn't go that far. There are more easy decisions than there were
before the reform, but there still exist hard decisions. This is perhaps
inevitable.


The Project Structure Reform occurred because the TC could not make
decisions on whether projects should join OpenStack using objective
criteria, and due to this, new project applicants were forced to endure
long waits and subjective "graduation" reviews that could change from
one TC election cycle to the next.

The solution to this was to make an objective set of application
criteria and remove the TC from the "Supreme Court of OpenStack" role
that new applicants needed to come before and submit to the court's
judgment.

Many people complained that the Project Structure Reform was the TC
simply abrogating responsibility for being a judgmental body.

It seems that although we've now gotten rid of those objective criteria
for project inclusion and gone back to the TC being a subjective
judgmental body, that the TC is still not actually willing to pass
judgment one way or the other on new project applicants.


No criteria have been gotten rid of, but even after the Project
Structure Reform there existed criteria that were subjective. Here is a
thread discussing them during the last TC election:

http://lists.openstack.org/pipermail/openstack-dev/2018-April/129622.html

(I actually think that the perception that the criteria should be
entirely objective might be a contributor to the problem: when faced
with a subjective decision and no documentation or precedent to guide
them, TC members can be reluctant to choose.)


I think turning the decision about which projects fit the mission
into an entirely mechanical one would be a mistake. I would prefer
us to use, and trust, our judgement in cases where the answer needs
some thought.

I don't remember the history quite the way Jay does, either. I
remember us trying to base the decision more about what the team
was doing than how the code looked or whether the implementation
met anyone's idea of "good". That's why we retained the requirement
that the project "aligns with the OpenStack Mission".


Hmm. I very specifically remember the incubation and graduation review 
of Zaqar and the fact that over a couple cycles of TC elections, the 
"advice" given by the TC about specific technical implementation details 
changed, often arbitrarily, depending on who was on the TC and what day 
of the week it was. In fact, I pretty vividly remember this arbitrary 
nature of the architectural review being one of the primary reasons we 
switched to a purely objective set of criteria.


Also, for the record, I actually wasn't referring to Adjutant 
specifically when I referred in my original post to "only tangentially 
related to cloud computing". I was referring to my recollection of 
fairly recent history. I remember the seemingly endless debates about 
whether some applicants "fit" the OpenStack ecosystem or whether the 
applicant was merely trying to jump on a hype bandwagon for marketing 
purposes. Again, I wasn't specifically referring to Adjutant here, so I 
apologize if my words came across that way.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] When can/should we change additionalProperties=False in GET /servers(/detail)?

2018-09-17 Thread Jay Pipes

On 09/17/2018 03:28 PM, Matt Riedemann wrote:
This is a question from a change [1] which adds a new changes-before 
filter to the servers, os-instance-actions and os-migrations APIs.


For context, the os-instance-actions API stopped accepting undefined 
query parameters in 2.58 when we added paging support.


The os-migrations API stopped allowing undefined query parameters in 
2.59 when we added paging support.


The open question on the review is if we should change GET /servers and 
GET /servers/detail to stop allowing undefined query parameters starting 
with microversion 2.66 [2]. Apparently when we added support for 2.5 and 
2.26 for listing servers we didn't think about this. It means that a 
user can specify a query parameter, documented in the API reference, but 
with an older microversion and it will be silently ignored. That is 
backward compatible but confusing from an end user perspective since it 
would appear to them that the filter is not being applied, when it fact 
it would be if they used the correct microversion.


So do we want to start enforcing query parameters when listing servers 
to our defined list with microversion 2.66 or just continue to silently 
ignore them if used incorrectly?


Note that starting in Rocky, the Neutron API will start rejecting 
unknown query parameteres [3] if the filter-validation extension is 
enabled (since Neutron doesn't use microversions). So there is some 
precedent in OpenStack for starting to enforce query parameters.


[1] https://review.openstack.org/#/c/599276/
[2] 
https://review.openstack.org/#/c/599276/23/nova/api/openstack/compute/schemas/servers.py 

[3] 
https://docs.openstack.org/releasenotes/neutron/rocky.html#upgrade-notes


My vote would be just change additionalProperties to False in the 599276 
patch and be done with it.


Add a release note about the change, of course.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] notes from stein ptg meetings of the technical committee

2018-09-17 Thread Jay Pipes

On 09/17/2018 01:31 PM, Doug Hellmann wrote:

New Project Application Process
===

We wrapped up Sunday with a discussion of of our process for reviewing
new project applications. Zane and Chris in particular felt the
process for Adjutant was too painful for the project team because
there was no way to know how long discussions might go on and now
way for them to anticipate some of the issues they encountered.

We talked about formalizing a "coach" position to have someone from
the TC (or broader community) work with the team to prepare their
application with sufficient detail, seek feedback before voting
starts, etc.

We also talked about adding a time limit to the process, so that
teams at least have a rejection with feedback in a reasonable amount
of time.  Some of the less contentious discussions have averaged
from 1-4 months with a few more contentious cases taking as long
as 10 months. We did not settle on a time frame during the meeting,
so I expect this to be a topic for us to work out during the next
term.


So, to summarize... the TC is back to almost exactly the same point it 
was at right before the Project Structure Reform happened in 2014-2015 
(that whole Big Tent thing).


The Project Structure Reform occurred because the TC could not make 
decisions on whether projects should join OpenStack using objective 
criteria, and due to this, new project applicants were forced to endure 
long waits and subjective "graduation" reviews that could change from 
one TC election cycle to the next.


The solution to this was to make an objective set of application 
criteria and remove the TC from the "Supreme Court of OpenStack" role 
that new applicants needed to come before and submit to the court's 
judgment.


Many people complained that the Project Structure Reform was the TC 
simply abrogating responsibility for being a judgmental body.


It seems that although we've now gotten rid of those objective criteria 
for project inclusion and gone back to the TC being a subjective 
judgmental body, that the TC is still not actually willing to pass 
judgment one way or the other on new project applicants.


Is this because it is still remarkably unclear what OpenStack actually 
*is* (the whole mission/scope thing)?


Or is this because TC members simply don't want to be the ones to say 
"No" to good-meaning people that may have an idea that is only 
tangentially related to cloud computing?


Everything old is new again.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] how nova should behave when placement returns consumer generation conflict

2018-09-17 Thread Jay Pipes

Thanks Giblet,

Will review this afternoon.

Best,
-jay

On 09/17/2018 09:10 AM, Balázs Gibizer wrote:


Hi,

Reworked and rebased the series based on this thread. The series starts 
here https://review.openstack.org/#/c/591597


Cheers,
gibi


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] About microversion setting to enable nested resource provider

2018-09-17 Thread Jay Pipes

On 09/16/2018 09:28 PM, Naichuan Sun wrote:

Hi, Sylvain,

In truth I’m worrying about the old root rp which include the vgpu 
inventory. There is no field in the inventory which can display which 
GPU/GPUG it belong to, right? Anyway,  will discuss it after you come back.


As Sylvain mentions below, you will need to have some mechanism in the 
XenAPI virt driver which creates child resource providers under the 
existing root provider (which is the compute node resource provider). 
You will need to have the virt driver persist the mapping between your 
internal physical GPU group name and the UUID of the resource provider 
record that the virt driver creates for that PGPU group.


So, for example, let's say you have two PGPU groups on the host. They 
are named PGPU_A and PGPU_B. The XenAPI virt driver will need to ask the 
ProviderTree object it receives in the update_provider_tree() virt 
driver method whether there is a resource provider named "PGPU_A" in the 
tree. If not, the virt driver needs to create a new child resource 
provider with the name "PGPU_A" with a parent provider pointing to the 
root compute node provider. The ProviderTree.new_child() method is used 
to create new child providers:


https://github.com/openstack/nova/blob/82270cc261f6c1d9d2cc386f1fb445dd66023f75/nova/compute/provider_tree.py#L411

Hope that makes sense,
-jay


Thank very much.

BR.

Naichuan Sun

*From:*Sylvain Bauza [mailto:sba...@redhat.com]
*Sent:* Friday, September 14, 2018 9:34 PM
*To:* OpenStack Development Mailing List (not for usage questions) 

*Subject:* Re: [openstack-dev] About microversion setting to enable 
nested resource provider


Le jeu. 13 sept. 2018 à 19:29, Naichuan Sun <mailto:naichuan@citrix.com>> a écrit :


Hi, Sylvain,

Thank you very much for the information. It is pity that I can’t
attend the meeting.

I have a concern about reshaper in multi-type vgpu support.

In the old vgpu support, we only have one vgpu inventory in root
resource provider, which means we only support one vgpu type. When
do reshape, placement will send allocations(which include just one
vgpu resource allocation information) to the driver, if the host
have more than one pgpu/pgpug(which support different vgpu type),
how do we know which pgpu/pgpug own the allocation information? Do
we need to communicate with hypervisor the confirm that?

The reshape will actually move the existing allocations for a VGPU 
resource class to the inventory for this class that is on the child 
resource provider now with the reshape.


Since we agreed on keeping consistent naming, there is no need to guess 
which is which. That said, you raise a point that was discussed during 
the PTG and we all agreed there was an upgrade impact as multiple vGPUs 
shouldn't be allowed until the reshape is done.


Accordingly, see my spec I reproposed for Stein which describes the 
upgrade impact https://review.openstack.org/#/c/602474/


Since I'm at the PTG, we have huge time difference between you and me, 
but we can discuss on that point next week when I'm back (my mornings 
match then your afternoons)


-Sylvain

Thank you very much.

BR.

Naichuan Sun

*From:*Sylvain Bauza [mailto:sba...@redhat.com
<mailto:sba...@redhat.com>]
*Sent:* Thursday, September 13, 2018 11:47 PM
*To:* OpenStack Development Mailing List (not for usage questions)
mailto:openstack-dev@lists.openstack.org>>
*Subject:* Re: [openstack-dev] About microversion setting to enable
nested resource provider

Hey Naichuan,

FWIW, we discussed on the missing pieces for nested resource
providers. See the (currently-in-use) etherpad
https://etherpad.openstack.org/p/nova-ptg-stein and lookup for
"closing the gap on nested resource providers" (L144 while I speak)

The fact that we are not able to schedule yet is a critical piece
that we said we're going to work on it as soon as we can.

-Sylvain

On Thu, Sep 13, 2018 at 9:14 AM, Eric Fried mailto:openst...@fried.cc>> wrote:

There's a patch series in progress for this:

https://review.openstack.org/#/q/topic:use-nested-allocation-candidates

It needs some TLC. I'm sure gibi and tetsuro would welcome some
help...

efried


On 09/13/2018 08:31 AM, Naichuan Sun wrote:
 > Thank you very much, Jay.
 > Is there somewhere I could set microversion(some configure
file?), Or just modify the source code to set it?
 >
 > BR.
 > Naichuan Sun
     >
 > -Original Message-
 > From: Jay Pipes [mailto:jaypi...@gmail.com
<mailto:jaypi...@gmail.com>]
 > Sent: Thursday, September 13, 2018 9:19 PM
 > To: Naichuan Sun mailto:naichuan@citrix.com>>; OpenStack Development Mailing
List (not 

Re: [openstack-dev] About microversion setting to enable nested resource provider

2018-09-13 Thread Jay Pipes

On 09/13/2018 06:39 AM, Naichuan Sun wrote:

Hi, guys,

Looks n-rp is disabled by default because microversion matches 1.29 : 
https://github.com/openstack/nova/blob/master/nova/api/openstack/placement/handlers/allocation_candidate.py#L252


Anyone know how to set the microversion to enable n-rp in placement?


It is the client which must send the 1.29+ placement API microversion 
header to indicate to the placement API server that the client wants to 
receive nested provider information in the allocation candidates response.


Currently, nova-scheduler calls the scheduler reportclient's 
get_allocation_candidates() method:


https://github.com/openstack/nova/blob/0ba34a818414823eda5e693dc2127a534410b5df/nova/scheduler/manager.py#L138

The scheduler reportclient's get_allocation_candidates() method 
currently passes the 1.25 placement API microversion header:


https://github.com/openstack/nova/blob/0ba34a818414823eda5e693dc2127a534410b5df/nova/scheduler/client/report.py#L353

https://github.com/openstack/nova/blob/0ba34a818414823eda5e693dc2127a534410b5df/nova/scheduler/client/report.py#L53

In order to get the nested information returned in the allocation 
candidates response, that would need to be upped to 1.29.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ptg][cinder][placement] etherpad for this afternoon's meeting

2018-09-11 Thread Jay Pipes
Hi Jay, where is this discussion taking place?

On Tue, Sep 11, 2018, 11:10 AM Jay S Bryant  wrote:

> All,
>
> I have created an etherpad to take notes during our meeting this
> afternoon:
> https://etherpad.openstack.org/p/cinder-placement-denver-ptg-2018
>
> If you have information you want to get in there before the meeting I
> would appreciate you pre-populating the pad.
>
> Jay
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement][upgrade][qa] Some upgrade-specific news on extraction

2018-09-07 Thread Jay Pipes

On 09/07/2018 11:17 AM, Dan Smith wrote:

The other obvious thing is the database. The placement repo code as-is
today still has the check for whether or not it should use the
placement database but falls back to using the nova_api database
[5]. So technically you could point the extracted placement at the
same nova_api database and it should work. However, at some point
deployers will clearly need to copy the placement-related tables out
of the nova_api DB to a new placement DB and make sure the
'migrate_version' table is dropped so that placement DB schema
versions can reset to 1.


I think it's wrong to act like placement and nova-api schemas are the
same. One is a clone of the other at a point in time, and technically it
will work today. However the placement db sync tool won't do the right
thing, and I think we run the major risk of operators not fully grokking
what is going on here, seeing that pointing placement at nova-api
"works" and move on. Later, when we add the next placement db migration
(which could technically happen in stein), they will either screw their
nova-api schema, or mess up their versioning, or be unable to apply the
placement change.


With respect to grenade and making this work in our own upgrade CI
testing, we have I think two options (which might not be mutually
exclusive):

1. Make placement support using nova.conf if placement.conf isn't
found for Stein with lots of big warnings that it's going away in
T. Then Rocky nova.conf with the nova_api database configuration just
continues to work for placement in Stein. I don't think we then have
any grenade changes to make, at least in Stein for upgrading *from*
Rocky. Assuming fresh devstack installs in Stein use placement.conf
and a placement-specific database, then upgrades from Stein to T
should also be OK with respect to grenade, but likely punts the
cut-over issue for all other deployment projects (because we don't CI
with grenade doing Rocky->Stein->T, or FFU in other words).


As I have said above and in the review, I really think this is the wrong
approach. At the current point of time, the placement schema is a clone
of the nova-api schema, and technically they will work. At the first point
that placement evolves its schema, that will no longer be a workable
solution, unless we also evolve nova-api's database in lockstep.


2. If placement doesn't support nova.conf in Stein, then grenade will
require an (exceptional) [6] from-rocky upgrade script which will (a)
write out placement.conf fresh and (b) run a DB migration script,
likely housed in the placement repo, to create the placement database
and copy the placement-specific tables out of the nova_api
database. Any script like this is likely needed regardless of what we
do in grenade because deployers will need to eventually do this once
placement would drop support for using nova.conf (if we went with
option 1).


Yep, and I'm asserting that we should write that script, make grenade do
that step, and confirm that it works. I think operators should do that
step during the stein upgrade because that's where the fork/split of
history and schema is happening. I'll volunteer to do the grenade side
at least.

Maybe it would help to call out specifically that, IMHO, this can not
and should not follow the typical config deprecation process. It's not a
simple case of just making sure we "find" the nova-api database in the
various configs. The problem is that _after_ the split, they are _not_
the same thing and should not be considered as the same. Thus, I think
to avoid major disaster and major time sink for operators later, we need
to impose the minor effort now to make sure that they don't take the
process of deploying a new service lightly.

Jay's original relatively small concern was that deploying a new
placement service and failing to properly configure it would result in a
placement running with the default, empty, sqlite database. That's a
valid concern, and I think all we need to do is make sure we fail in
that case, explaining the situation.

We just had a hangout on the topic and I think we've come around to the
consensus that just removing the default-to-empty-sqlite behavior is the
right thing to do. Placement won't magically find nova.conf if it exists
and jump into its database, and it also won't do the silly thing of
starting up with an empty database if the very important config step is
missed in the process of deploying placement itself. Operators will have
to deploy the new package and do the database surgery (which we will
provide instructions and a script for) as part of that process, but
there's really no other sane alternative without changing the current
agreed-to plan regarding the split.

Is everyone okay with the above summary of the outcome?


Yes from my perspective.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 

Re: [openstack-dev] better name for placement

2018-09-05 Thread Jay Pipes

On 09/05/2018 11:48 AM, Jeremy Stanley wrote:

On 2018-09-05 17:01:49 +0200 (+0200), Thomas Goirand wrote:
[...]

In a distro, no 2 package can hold the same file. That's
forbidden. This has nothing to do if someone has to "import
placemement" or not.

Just saying this, but *not* that we should rename (I didn't spot
any conflict yet and I understand the pain it would induce). This
command returns nothing:

apt-file search placement | grep python3/dist-packages/placement


Well, also since the Placement maintainers have expressed that they
aren't interested in making Python API contracts for it to be usable
as an importable library, there's probably no need to install its
modules into the global Python search path anyway. They could just
go into a private module path on the filesystem instead as long as
the placement service/entrypoint wrapper knows where to find them,
right?


Yep.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] better name for placement

2018-09-04 Thread Jay Pipes

On 09/04/2018 01:17 PM, Balázs Gibizer wrote:

On Tue, Sep 4, 2018 at 7:01 PM, Jay Pipes  wrote:

On 09/04/2018 12:59 PM, Balázs Gibizer wrote:

On Tue, Sep 4, 2018 at 6:25 PM, Jay Pipes  wrote:

On 09/04/2018 12:17 PM, Doug Hellmann wrote:

Excerpts from Jay Pipes's message of 2018-09-04 12:08:41 -0400:

On 09/04/2018 11:44 AM, Doug Hellmann wrote:

Excerpts from Chris Dent's message of 2018-09-04 15:32:12 +0100:

On Tue, 4 Sep 2018, Jay Pipes wrote:

Is there a reason we couldn't have 
openstack-placement be the package name?


I would hope we'd be able to do that, and probably should do that.
'openstack-placement' seems a find pypi package name for a think
from which you do 'import placement' to do some openstack stuff,
yeah?


That's still a pretty generic name for the top-level 
import, but I think
the only real risk is that the placement service couldn't be 
installed
at the same time as another package owned by someone 
else that used that

top-level name. I'm not sure how much of a risk that really is.


You mean if there was another Python package that used the package 
name

"placement"?

The alternative would be to make the top-level package something like
os_placement instead?


Either one works for me. Though I'm pretty sure that it isn't 
necessary. The reason it isn't necessary is because the stuff 
in the top-level placement package isn't meant to be imported by 
anything at all. It's the placement server code.


What about placement direct and the effort to allow cinder to 
import placement instead of running it as a separate service?


I don't know what placement direct is. Placement wasn't designed to be 
imported as a module. It was designed to be a (micro-)service with a 
REST API for interfacing.


In Vancouver we talked about allowing cinder to import placement as a 
library. See https://etherpad.openstack.org/p/YVR-cinder-placement L47


I wasn't in YVR, which explains why I's never heard of it. There's a 
number of misconceptions in the above document about the placement 
service that don't seem to have been addressed. I'm wondering if its 
worth revisiting the topic in Denver with the Cinder team or whether the 
Cinder team isn't interested in working with the placement service?


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] better name for placement

2018-09-04 Thread Jay Pipes

On 09/04/2018 12:59 PM, Balázs Gibizer wrote:

On Tue, Sep 4, 2018 at 6:25 PM, Jay Pipes  wrote:

On 09/04/2018 12:17 PM, Doug Hellmann wrote:

Excerpts from Jay Pipes's message of 2018-09-04 12:08:41 -0400:

On 09/04/2018 11:44 AM, Doug Hellmann wrote:

Excerpts from Chris Dent's message of 2018-09-04 15:32:12 +0100:

On Tue, 4 Sep 2018, Jay Pipes wrote:

Is there a reason we couldn't have openstack-placement be the 
package name?


I would hope we'd be able to do that, and probably should do that.
'openstack-placement' seems a find pypi package name for a think
from which you do 'import placement' to do some openstack stuff,
yeah?


That's still a pretty generic name for the top-level import, but I 
think

the only real risk is that the placement service couldn't be installed
at the same time as another package owned by someone else that used 
that

top-level name. I'm not sure how much of a risk that really is.


You mean if there was another Python package that used the package name
"placement"?

The alternative would be to make the top-level package something like
os_placement instead?


Either one works for me. Though I'm pretty sure that it isn't 
necessary. The reason it isn't necessary is because the stuff in the 
top-level placement package isn't meant to be imported by anything at 
all. It's the placement server code.


What about placement direct and the effort to allow cinder to import 
placement instead of running it as a separate service?


I don't know what placement direct is. Placement wasn't designed to be 
imported as a module. It was designed to be a (micro-)service with a 
REST API for interfacing.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] better name for placement

2018-09-04 Thread Jay Pipes

On 09/04/2018 12:17 PM, Doug Hellmann wrote:

Excerpts from Jay Pipes's message of 2018-09-04 12:08:41 -0400:

On 09/04/2018 11:44 AM, Doug Hellmann wrote:

Excerpts from Chris Dent's message of 2018-09-04 15:32:12 +0100:

On Tue, 4 Sep 2018, Jay Pipes wrote:


Is there a reason we couldn't have openstack-placement be the package name?


I would hope we'd be able to do that, and probably should do that.
'openstack-placement' seems a find pypi package name for a think
from which you do 'import placement' to do some openstack stuff,
yeah?


That's still a pretty generic name for the top-level import, but I think
the only real risk is that the placement service couldn't be installed
at the same time as another package owned by someone else that used that
top-level name. I'm not sure how much of a risk that really is.


You mean if there was another Python package that used the package name
"placement"?

The alternative would be to make the top-level package something like
os_placement instead?


Either one works for me. Though I'm pretty sure that it isn't necessary. 
The reason it isn't necessary is because the stuff in the top-level 
placement package isn't meant to be imported by anything at all. It's 
the placement server code.


Nothing is going to be adding openstack-placement into its 
requirements.txt file or doing:


 from placement import blah

If some part of the server repo is meant to be imported into some other 
system, say nova, then it will be pulled into a separate lib, ala 
ironiclib or neutronlib.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] better name for placement

2018-09-04 Thread Jay Pipes

On 09/04/2018 11:44 AM, Doug Hellmann wrote:

Excerpts from Chris Dent's message of 2018-09-04 15:32:12 +0100:

On Tue, 4 Sep 2018, Jay Pipes wrote:


Is there a reason we couldn't have openstack-placement be the package name?


I would hope we'd be able to do that, and probably should do that.
'openstack-placement' seems a find pypi package name for a think
from which you do 'import placement' to do some openstack stuff,
yeah?


That's still a pretty generic name for the top-level import, but I think
the only real risk is that the placement service couldn't be installed
at the same time as another package owned by someone else that used that
top-level name. I'm not sure how much of a risk that really is.


You mean if there was another Python package that used the package name 
"placement"?


The alternative would be to make the top-level package something like 
os_placement instead?


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] better name for placement

2018-09-04 Thread Jay Pipes

On 09/04/2018 09:37 AM, Jeremy Stanley wrote:

On 2018-09-04 09:32:20 +0100 (+0100), Chris Dent wrote:
[...]

it allowed for the possibility that there could be another project
which provided the same service-type. That hasn't really come to
pass

[...]

It still might make sense to attempt to look at this issue from
outside the limited scope of the OpenStack community. Is the
expectation that the project when packaged (on PyPI, in Linux
distributions and so on) will just be referred to as "placement"
with no further context?


I don't see any reason why the package name needs to be identical to the 
repo name.


Is there a reason we couldn't have openstack-placement be the package name?

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Nominating Chris Dent for placement-core

2018-08-31 Thread Jay Pipes

On 08/31/2018 11:45 AM, Eric Fried wrote:

The openstack/placement project [1] and its core team [2] have been
established in gerrit.

I hereby nominate Chris Dent for membership in the placement-core team.
He has been instrumental in the design, implementation, and stewardship
of the placement API since its inception and has shown clear and
consistent leadership.

As we are effectively bootstrapping placement-core at this time, it
would seem appropriate to consider +1/-1 responses from heavy placement
contributors as well as existing cores (currently nova-core).

[1] https://review.openstack.org/#/admin/projects/openstack/placement
[2] https://review.openstack.org/#/admin/groups/1936,members


+1

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [chef] fog-openstack 0.2.0 breakage

2018-08-31 Thread Jay Pipes
Thanks for notifying about this, Samuel. Our most modern deployment is 
actually currently blocked on this and I'm glad to see a resolution.


Best,
-jay

On 08/31/2018 11:59 AM, Samuel Cassiba wrote:

Ohai!

fog-openstack 0.2.0 was recently released, which had less than optimal
effects on Chef OpenStack due to the client cookbook's lack of version
pinning on the gem.

The crucial change is that fog-openstack itself now determines
Identity API versions internally, in preparation for a versionless
Keystone endpoint. Chef OpenStack has carried code for Identity API
determination for years, to facilitate migrating from Identity v2.0 to
Identity v3. Unfortunately, those two methods became at odds with the
release of fog-openstack 0.2.

At the time of this writing, PR #421
(https://github.com/fog/fog-openstack/pull/421) has been merged, but
there is no new release on rubygems.org as of yet. That is likely to
happen Very Soon(tm).

On the home front, with the help of Roger Luethi and Christoph Albers,
we've introduced version constraints to the client cookbook to pin the
gem to 0.1.x. At present, we've merged constraints for master,
stable/queens and stable/pike.

The new release was primed to go into ChefDK 3.2 had it not been
brought up sooner. Thank you to everyone who gave a heads-up!

Best,

scas

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][nova] Small bandwidth demo on the PTG

2018-08-30 Thread Jay Pipes

On 08/30/2018 04:55 AM, Balázs Gibizer wrote:

Hi,

Based on the Nova PTG planning etherpad [1] there is a need to talk 
about the current state of the bandwidth work [2][3]. Bence (rubasov) 
has already planned to show a small demo to Neutron folks about the 
current state of the implementation. So Bence and I are wondering about 
bringing that demo close to the nova - neutron cross project session. 
That session is currently planned to happen Thursday after lunch. So we 
are think about showing the demo right before that session starts. It 
would start 30 minutes before the nova - neutron cross project session.


Are Nova folks also interested in seeing such a demo?

If you are interested in seeing the demo please drop us a line or ping 
us in IRC so we know who should we wait for.


+1 from me. I'd be very interested in seeing it.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] XenServer CI failed frequently because of placement update

2018-08-29 Thread Jay Pipes
I think the immediate solution would be to just set cpu_allocation_ratio 
to 16.0 in the nova.CONF that your CI system is using.


Best,
-jay

On 08/29/2018 05:26 AM, Naichuan Sun wrote:

Hi, Eric and Jay,

The VCPU/Disk/RAM allocation ratio are set to 0.0 by default, and resource 
tracker would reset it to valid values in  
https://github.com/openstack/nova/blob/master/nova/objects/compute_node.py#L199.
But it looks the value is set back to 0.0 by some function(I'm not sure who 
does it...), so xenserver CI is broken. Any suggestion about that?
Looks libvirt works well, do they set allocation ratio in the configure file?

Thank you very much.

BR.
Naichuan Sun

-Original Message-
From: Naichuan Sun
Sent: Wednesday, August 29, 2018 7:00 AM
To: OpenStack Development Mailing List (not for usage questions) 

Subject: RE: [openstack-dev] [nova] [placement] XenServer CI failed frequently 
because of placement update

Thank you very much for the help, Bob, Jay and Eric.

Naichuan Sun

-Original Message-
From: Bob Ball [mailto:bob.b...@citrix.com]
Sent: Wednesday, August 29, 2018 12:22 AM
To: OpenStack Development Mailing List (not for usage questions) 

Subject: Re: [openstack-dev] [nova] [placement] XenServer CI failed frequently 
because of placement update


Yeah, the nova.CONF cpu_allocation_ratio is being overridden to 0.0:


The default there is 0.0[1] - and the passing tempest-full from Zuul on 
https://review.openstack.org/#/c/590041/ has the same line when reading the 
config[2]:

We'll have a dig to see if we can figure out why it's not defaulting to 16 in 
the ComputeNode.

Thanks!

Bob

[1] https://git.openstack.org/cgit/openstack/nova/tree/nova/conf/compute.py#n386
[2] 
http://logs.openstack.org/41/590041/17/check/tempest-full/b3f9ddd/controller/logs/screen-n-cpu.txt.gz#_Aug_27_14_18_24_078058
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [api] Open API 3.0 for OpenStack API

2018-08-29 Thread Jay Pipes

On 08/29/2018 02:36 AM, Edison Xiang wrote:
Based on Open API 3.0, it can bring lots of benefits for OpenStack 
Community and does not impact the current features the Community has.



3rd party developers can also do some self-defined development.


Hi Edison,

Would you mind expanding on what you are referring to with the above 
line about 3rd party developers doing self-defined development?


Thanks!
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] XenServer CI failed frequently because of placement update

2018-08-28 Thread Jay Pipes

Yeah, the nova.CONF cpu_allocation_ratio is being overridden to 0.0:

Aug 27 07:43:02.179927 dsvm-devstack-citrix-lon-nodepool-1379254 
nova-compute[21125]: DEBUG oslo_service.service [None 
req-4bb236c4-54c3-42b7-aa4e-e5c8b1ece0c7 None None] cpu_allocation_ratio 
  = 0.0 {{(pid=21125) log_opt_values 
/usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py:3019}}


Best,
-jay

On 08/28/2018 10:01 AM, Bob Ball wrote:

We're not running with [1], however that did also fail the CI in the same way - 
see [2] for the full logs.

The first failing appeared to be around Aug 27 08:32:14:
Aug 27 08:32:14.502788 dsvm-devstack-citrix-lon-nodepool-1379254 
devstack@placement-api.service[13219]: DEBUG nova.api.openstack.placement.requestlog 
[req-94714f18-87f3-4ff5-9b17-f6e50131b3a9 req-fc47376d-cf04-4cd3-b69c-31ef4d5739a4 service 
placement] Starting request: 192.168.33.1 "GET 
/placement/allocation_candidates?limit=1000=MEMORY_MB%3A64%2CVCPU%3A1" 
{{(pid=13222) __call__ /opt/stack/new/nova/nova/api/openstack/placement/requestlog.py:38}}
Aug 27 08:32:14.583676 dsvm-devstack-citrix-lon-nodepool-1379254 
devstack@placement-api.service[13219]: DEBUG 
nova.api.openstack.placement.objects.resource_provider 
[req-94714f18-87f3-4ff5-9b17-f6e50131b3a9 
req-fc47376d-cf04-4cd3-b69c-31ef4d5739a4 service placement] found 0 providers 
with available 1 VCPU {{(pid=13222) _get_provider_ids_matching 
/opt/stack/new/nova/nova/api/openstack/placement/objects/resource_provider.py:2928}}

Just looking at Naichuan's output, I wonder if this is because allocation_ratio 
is registered as 0 in the inventory.

Bob

[2] 
http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/41/590041/17/check/dsvm-tempest-neutron-network/afadfe7/

-Original Message-
From: Eric Fried [mailto:openst...@fried.cc]
Sent: 28 August 2018 14:22
To: OpenStack Development Mailing List (not for usage questions) 

Subject: Re: [openstack-dev] [nova] [placement] XenServer CI failed frequently 
because of placement update

Naichuan-

Are you running with [1]? If you are, the placement logs (at debug
level) should be giving you some useful info. If you're not... perhaps you 
could pull that in :) Note that it refactors the _get_provider_ids_matching 
method completely, so it's possible your problem will magically go away when 
you do.

[1] https://review.openstack.org/#/c/590041/

On 08/28/2018 07:54 AM, Jay Pipes wrote:

On 08/28/2018 04:17 AM, Naichuan Sun wrote:

Hi, experts,

XenServer CI failed frequently with an error "No valid host was found.
" for more than a week. I think it is cause by placement update.


Hi Naichuan,

Can you give us a link to the logs a patchset's Citrix XenServer CI
that has failed? Also, a timestamp for the failure you refer to would
be useful so we can correlate across service logs.

Thanks,
-jay

__
 OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] XenServer CI failed frequently because of placement update

2018-08-28 Thread Jay Pipes

On 08/28/2018 04:17 AM, Naichuan Sun wrote:

Hi, experts,

XenServer CI failed frequently with an error "No valid host was found. " 
for more than a week. I think it is cause by placement update.


Hi Naichuan,

Can you give us a link to the logs a patchset's Citrix XenServer CI that 
has failed? Also, a timestamp for the failure you refer to would be 
useful so we can correlate across service logs.


Thanks,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] extraction (technical) update

2018-08-27 Thread Jay Pipes

On 08/27/2018 11:31 AM, Matt Riedemann wrote:

On 8/24/2018 7:36 AM, Chris Dent wrote:


Over the past few days a few of us have been experimenting with
extracting placement to its own repo, as has been discussed at
length on this list, and in some etherpads:

https://etherpad.openstack.org/p/placement-extract-stein
https://etherpad.openstack.org/p/placement-extraction-file-notes

As part of that, I've been doing some exploration to tease out the
issues we're going to hit as we do it. None of this is work that
will be merged, rather it is stuff to figure out what we need to
know to do the eventual merging correctly and efficiently.

Please note that doing that is just the near edge of a large
collection of changes that will cascade in many ways to many
projects, tools, distros, etc. The people doing this are aware of
that, and the relative simplicity (and fairly immediate success) of
these experiments is not misleading people into thinking "hey, no
big deal". It's a big deal.

There's a strategy now (described at the end of the first etherpad
listed above) for trimming the nova history to create a thing which
is placement. From the first run of that Ed created a github repo
and I branched that to eventually create:

https://github.com/EdLeafe/placement/pull/2

In that, all the placement unit and functional tests are now
passing, and my placecat [1] integration suite also passes.

That work has highlighted some gaps in the process for trimming
history which will be refined to create another interim repo. We'll
repeat this until the process is smooth, eventually resulting in an
openstack/placement.


We talked about the github strategy a bit in the placement meeting today 
[1]. Without being involved in this technical extraction work for the 
past few weeks, I came in with a different perspective on the end-game, 
and it was not aligned with what Chris/Ed thought as far as how we get 
to the official openstack/placement repo.


At a high level, Ed's repo [2] is a fork of nova with large changes on 
top using pull requests to do things like remove the non-placement nova 
files, update import paths (because the import structure changes from 
nova.api.openstack.placement to just placement), and then changes from 
Chris [3] to get tests working. Then the idea was to just use that to 
seed the openstack/placement repo and rather than review the changes 
along the way*, people that care about what changed (like myself) would 
see the tests passing and be happy enough.


However, I disagree with this approach since it bypasses our community 
code review system of using Gerrit and relying on a core team to approve 
changes at the sake of expediency.


What I would like to see are the changes that go into making the seed 
repo and what gets it to passing tests done in gerrit like we do for 
everything else. There are a couple of options on how this is done though:


1. Seed the openstack/placement repo with the filter_git_history.sh 
script output as Ed has done here [4]. This would include moving the 
placement files to the root of the tree and dropping nova-specific 
files. Then make incremental changes in gerrit like with [5] and the 
individual changes which make up Chris's big pull request [3]. I am 
primarily interested in making sure there are not content changes 
happening, only mechanical tree-restructuring type changes, stuff like 
that. I'm asking for more changes in gerrit so they can be sanely 
reviewed (per normal).


2. Eric took a slightly different tack in that he's OK with just a 
couple of large changes (or even large patch sets within a single 
change) in gerrit rather than ~30 individual changes. So that would be 
more like at most 3 changes in gerrit for [4][5][3].


3. The 3rd option is we just don't use gerrit at all and seed the 
official repo with the results of Chris and Ed's work in Ed's repo in 
github. Clearly this would be the fastest way to get us to a new repo 
(at the expense of bucking community code review and development process 
- is an exception worth it?).


Option 1 would clearly be a drain on at least 2 nova cores to go through 
the changes. I think Eric is on board for reviewing options 1 or 2 in 
either case, but he prefers option 2. Since I'm throwing a wrench in the 
works, I also need to stand up and review the changes if we go with 
option 1 or 2. Jay said he'd review them but consider these reviews 
lower priority. I expect we could get some help from some other nova 
cores though, maybe not on all changes, but at least some (thinking 
gibi, alex_xu, sfinucan).


Any CI jobs would be non-voting while going through options 1 or 2 until 
we get to a point that tests should finally be passing and we can make 
them voting (it should be possible to control this within the repo 
itself using zuul v3).


I would like to know from others (nova core or otherwise) what they 
would prefer, and if you are a nova core that wants option 1 (or 2) are 
you willing to help review 

Re: [openstack-dev] [nova] how nova should behave when placement returns consumer generation conflict

2018-08-27 Thread Jay Pipes

On 08/22/2018 08:55 AM, Balázs Gibizer wrote:

On Fri, Aug 17, 2018 at 5:40 PM, Eric Fried  wrote:

gibi-

 - On migration, when we transfer the allocations in either 
direction, a

 conflict means someone managed to resize (or otherwise change
 allocations?) since the last time we pulled data. Given the global 
lock

 in the report client, this should have been tough to do. If it does
 happen, I would think any retry would need to be done all the way back
 at the claim, which I imagine is higher up than we should go. So 
again,

 I think we should fail the migration and make the user retry.


 Do we want to fail the whole migration or just the migration step (e.g.
 confirm, revert)?
 The later means that failure during confirm or revert would put the
 instance back to VERIFY_RESIZE. While the former would mean that in 
case
 of conflict at confirm we try an automatic revert. But for a 
conflict at

 revert we can only put the instance to ERROR state.


This again should be "impossible" to come across. What would the
behavior be if we hit, say, ValueError in this spot?


I might not totally follow you. I see two options to choose from for the 
revert case:


a) Allocation manipulation error during revert of a migration causes 
that instance goes to ERROR. -> end user cannot retry the revert the 
instance needs to be deleted.


I would say this one is correct, but not because the user did anything 
wrong. Rather, *something inside Nova failed* because technically Nova 
shouldn't allow resource allocation to change while a server is in 
CONFIRMING_RESIZE task state. If we didn't make the server go to an 
ERROR state, I'm afraid we'd have no indication anywhere that this 
improper situation ever happened and we'd end up hiding some serious 
data corruption bugs.


b) Allocation manipulation error during revert of a migration causes 
that the instance goes back to VERIFY_RESIZE state. -> end user can 
retry the revert via the API.


I see three options to choose from for the confirm case:

a) Allocation manipulation error during confirm of a migration causes 
that instance goes to ERROR. -> end user cannot retry the confirm the 
instance needs to be deleted.


For the same reasons outlined above, I think this is the only safe option.

Best,
-jay

b) Allocation manipulation error during confirm of a migration causes 
that the instance goes back to VERIFY_RESIZE state. -> end user can 
retry the confirm via the API.


c) Allocation manipulation error during confirm of a migration causes 
that nova automatically tries to revert the migration. (For failure 
during this revert the same options available as for the generic revert 
case, see above)


We also need to consider live migration. It is similar in a sense that 
it also use move_allocations. But it is different as the end user 
doesn't explicitly confirm or revert a live migration.


I'm looking for opinions about which option we should take in each cases.

gibi



-efried

__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] how nova should behave when placement returns consumer generation conflict

2018-08-27 Thread Jay Pipes

Sorry for the delay in responding to this, Gibi and Eric. Comments inline.

tl;dr: go with option a)

On 08/16/2018 11:34 AM, Eric Fried wrote:

Thanks for this, gibi.

TL;DR: a).

I didn't look, but I'm pretty sure we're not caching allocations in the
report client. Today, nobody outside of nova (specifically the resource
tracker via the report client) is supposed to be mucking with instance
allocations, right? And given the global lock in the resource tracker,
it should be pretty difficult to race e.g. a resize and a delete in any
meaningful way.


It's not a global (i.e. multi-node) lock. It's a semaphore for just that 
compute node. Migrations (mostly) involve more than one compute node, so 
the compute node semaphore is useless in that regard, thus the need to 
go with option a) and bail out if any change to a generation of any of 
the consumers involved in the migration operation.



So short term, IMO it is reasonable to treat any generation conflict
as an error. No retries. Possible wrinkle on delete, where it should
be a failure unless forced.


Agreed for all migration and deletion operations.


Long term, I also can't come up with any scenario where it would be
appropriate to do a narrowly-focused GET+merge/replace+retry. But
implementing the above short-term plan shouldn't prevent us from adding
retries for individual scenarios later if we do uncover places where it
makes sense.


Neither do I. Safety first, IMHO.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] UUID sentinel needs a home

2018-08-27 Thread Jay Pipes

On 08/24/2018 07:51 PM, Matt Riedemann wrote:

On 8/23/2018 2:05 PM, Chris Dent wrote:

On Thu, 23 Aug 2018, Dan Smith wrote:


...and it doesn't work like mock.sentinel does, which is part of the
value. I really think we should put this wherever it needs to be so that
it can continue to be as useful as is is today. Even if that means just
copying it into another project -- it's not that complicated of a thing.


Yeah, I agree. I had hoped that we could make something that was
generally useful, but its main value is its interface and if we
can't have that interface in a library, having it per codebase is no
biggie. For example it's been copied straight from nova into the
placement extractions experiments with no changes and, as one would
expect, works just fine.

Unless people are wed to doing something else, Dan's right, let's
just do that.


So just follow me here people, what if we had this common shared library 
where code could incubate and then we could write some tools to easily 
copy that common code into other projects...


Sounds masterful.

I'm pretty sure I could get said project approved as a top-level program 
under The Foundation and might even get a talk or two out of this idea. 
I can see the Intel money rolling in now...


Indeed, I'll open the commons bank account.

Ciao,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] UUID sentinel needs a home

2018-08-23 Thread Jay Pipes

On 08/23/2018 01:25 PM, Doug Hellmann wrote:

Excerpts from Eric Fried's message of 2018-08-23 09:51:21 -0500:

Do you mean an actual fixture, that would be used like:

  class MyTestCase(testtools.TestCase):
  def setUp(self):
  self.uuids = self.useFixture(oslofx.UUIDSentinelFixture()).uuids

  def test_foo(self):
  do_a_thing_with(self.uuids.foo)

?

That's... okay I guess, but the refactoring necessary to cut over to it
will now entail adding 'self.' to every reference. Is there any way
around that?


That is what I had envisioned, yes.  In the absence of a global,
which we do not want, what other API would you propose?


As dansmith mentioned, the niceness and simplicity of being able to do:

 import nova.tests.uuidsentinel as uuids

 ..

 def test_something(self):
 my_uuid = uuids.instance1

is remarkably powerful and is something I would want to keep.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] podman: varlink interface for nice API calls

2018-08-23 Thread Jay Pipes

Dan, thanks for the details and answers. Appreciated.

Best,
-jay

On 08/23/2018 10:50 AM, Dan Prince wrote:

On Wed, Aug 15, 2018 at 5:49 PM Jay Pipes  wrote:


On 08/15/2018 04:01 PM, Emilien Macchi wrote:

On Wed, Aug 15, 2018 at 5:31 PM Emilien Macchi mailto:emil...@redhat.com>> wrote:

 More seriously here: there is an ongoing effort to converge the
 tools around containerization within Red Hat, and we, TripleO are
 interested to continue the containerization of our services (which
 was initially done with Docker & Docker-Distribution).
 We're looking at how these containers could be managed by k8s one
 day but way before that we plan to swap out Docker and join CRI-O
 efforts, which seem to be using Podman + Buildah (among other things).

I guess my wording wasn't the best but Alex explained way better here:
http://eavesdrop.openstack.org/irclogs/%23openstack-tc/%23openstack-tc.2018-08-15.log.html#t2018-08-15T17:56:52

If I may have a chance to rephrase, I guess our current intention is to
continue our containerization and investigate how we can improve our
tooling to better orchestrate the containers.
We have a nice interface (openstack/paunch) that allows us to run
multiple container backends, and we're currently looking outside of
Docker to see how we could solve our current challenges with the new tools.
We're looking at CRI-O because it happens to be a project with a great
community, focusing on some problems that we, TripleO have been facing
since we containerized our services.

We're doing all of this in the open, so feel free to ask any question.


I appreciate your response, Emilien, thank you. Alex' responses to
Jeremy on the #openstack-tc channel were informative, thank you Alex.

For now, it *seems* to me that all of the chosen tooling is very Red Hat
centric. Which makes sense to me, considering Triple-O is a Red Hat product.


Perhaps a slight clarification here is needed. "Director" is a Red Hat
product. TripleO is an upstream project that is now largely driven by
Red Hat and is today marked as single vendor. We welcome others to
contribute to the project upstream just like anybody else.

And for those who don't know the history the TripleO project was once
multi-vendor as well. So a lot of the abstractions we have in place
could easily be extended to support distro specific implementation
details. (Kind of what I view podman as in the scope of this thread).



I don't know how much of the current reinvention of container runtimes
and various tooling around containers is the result of politics. I don't
know how much is the result of certain companies wanting to "own" the
container stack from top to bottom. Or how much is a result of technical
disagreements that simply cannot (or will not) be resolved among
contributors in the container development ecosystem.

Or is it some combination of the above? I don't know.

What I *do* know is that the current "NIH du jour" mentality currently
playing itself out in the container ecosystem -- reminding me very much
of the Javascript ecosystem -- makes it difficult for any potential
*consumers* of container libraries, runtimes or applications to be
confident that any choice they make towards one of the other will be the
*right* choice or even a *possible* choice next year -- or next week.
Perhaps this is why things like openstack/paunch exist -- to give you
options if something doesn't pan out.


This is exactly why paunch exists.

Re, the podman thing I look at it as an implementation detail. The
good news is that given it is almost a parity replacement for what we
already use we'll still contribute to the OpenStack community in
similar ways. Ultimately whether you run 'docker run' or 'podman run'
you end up with the same thing as far as the existing TripleO
architecture goes.

Dan



You have a tough job. I wish you all the luck in the world in making
these decisions and hope politics and internal corporate management
decisions play as little a role in them as possible.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] UUID sentinel needs a home

2018-08-23 Thread Jay Pipes

On 08/23/2018 08:06 AM, Doug Hellmann wrote:

Excerpts from Davanum Srinivas (dims)'s message of 2018-08-23 06:46:38 -0400:

Where exactly Eric? I can't seem to find the import:

http://codesearch.openstack.org/?q=(from%7Cimport).*oslotest=nope==oslo.utils

-- dims


oslo.utils depends on oslotest via test-requirements.txt and oslotest is
used within the test modules in oslo.utils.

As I've said on both reviews, I think we do not want a global
singleton instance of this sentinal class. We do want a formal test
fixture.  Either library can export a test fixture and olso.utils
already has oslo_utils.fixture.TimeFixture so there's precedent to
adding it there, so I have a slight preference for just doing that.

That said, oslo_utils.uuidutils.generate_uuid() is simply returning
str(uuid.uuid4()). We have it wrapped up as a function so we can
mock it out in other tests, but we hardly need to rely on that if
we're making a test fixture for oslotest.

My vote is to add a new fixture class to oslo_utils.fixture.


OK, thanks for the helpful explanation, Doug. Works for me.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] UUID sentinel needs a home

2018-08-22 Thread Jay Pipes
On Wed, Aug 22, 2018, 10:13 AM Eric Fried  wrote:

> For some time, nova has been using uuidsentinel [1] which conveniently
> allows you to get a random UUID in a single LOC with a readable name
> that's the same every time you reference it within that process (but not
> across processes). Example usage: [2].
>
> We would like other projects (notably the soon-to-be-split-out placement
> project) to be able to use uuidsentinel without duplicating the code. So
> we would like to stuff it in an oslo lib.
>
> The question is whether it should live in oslotest [3] or in
> oslo_utils.uuidutils [4]. The proposed patches are (almost) the same.
> The issues we've thought of so far:
>
> - If this thing is used only for test, oslotest makes sense. We haven't
> thought of a non-test use, but somebody surely will.
> - Conversely, if we put it in oslo_utils, we're kinda saying we support
> it for non-test too. (This is why the oslo_utils version does some extra
> work for thread safety and collision avoidance.)
> - In oslotest, awkwardness is necessary to avoid circular importing:
> uuidsentinel uses oslo_utils.uuidutils, which requires oslotest. In
> oslo_utils.uuidutils, everything is right there.
>

My preference is to put it in oslotest. Why does oslo_utils.uuidutils
import oslotest? That makes zero sense to me...

-jay

- It's a... UUID util. If I didn't know anything and I was looking for a
> UUID util like uuidsentinel, I would look in a module called uuidutils
> first.
>
> We hereby solicit your opinions, either by further discussion here or as
> votes on the respective patches.
>
> Thanks,
> efried
>
> [1]
>
> https://github.com/openstack/nova/blob/17b69575bc240ca1dd8b7a681de846d90f3b642c/nova/tests/uuidsentinel.py
> [2]
>
> https://github.com/openstack/nova/blob/17b69575bc240ca1dd8b7a681de846d90f3b642c/nova/tests/functional/api/openstack/placement/db/test_resource_provider.py#L109-L115
> [3] https://review.openstack.org/594068
> [4] https://review.openstack.org/594179
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [nova] [neutron] live migration with multiple port bindings.

2018-08-21 Thread Jay Pipes
On Tue, Aug 21, 2018, 8:29 AM Matt Riedemann  wrote:

> On 8/21/2018 7:28 AM, Matt Riedemann wrote:
> > On 8/20/2018 3:18 PM, Sean Mooney wrote:
> >> in both the ovs-dpdk tests, when the migration failed and the vm
> >> contiuned to run on the source node however
> >> it had no network connectivity. on hard reboot of the vm, it went to
> >> error state because the vif binding
> >> was set to none as the vif:bidning-details:host_id  was set to none so
> >> the vif_type was also set to none.
> >> i have opened a nova bug to track the fact that the vm is left in an
> >> invalid state even though the status is active.
> >> see bug 1788014
> >
> > I've got a nova patch for this here:
> >
> > https://review.openstack.org/#/c/594139/
> >
> > However, I'd like Miguel to look at that bug because I assumed that when
> > nova deletes the dest host port binding, the only remaining port binding
> > is the inactive one for the source host, and neutron would automatically
> > activate it, similar to how neutron will automatically deactivate all
> > other bindings for a port when one of the other bindings is activated
> > (like when nova activates the dest host port binding during live
> > migration, the source host port binding is automatically deactivated
> > because only one port binding can be active at any time). If there is a
> > good reason why neutron doesn't do this on port binding delete, then I
> > guess we go with fixing this in nova.
> >
>
> By the way, Sean, thanks a ton for doing all of this testing. It's super
> helpful and way above anything I could have gotten setup myself for the
> various neutron backend configurations.
>

Agreed, big +1 and thanks to Sean for doing this.

However, I'd like to point out that this highlights the unfortunate
situation we're in: only a select couple contributors actually are able to
understand the overly complex, ludicrously inconsistent, and all too often
incompatible networking technologies that OpenStack has come to rely on 

This reminds me of a recent conversation I had on Twitter with an old
coworker of mine who is now at booking. com who stated the frustrating
complexity of networking and SDN setup in OpenStack was the reason he
switched to Kubernetes and hasn't looked back since.

-jay


> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-20 Thread Jay Pipes

On 08/18/2018 08:25 AM, Chris Dent wrote:

So my hope is that (in no particular order) Jay Pipes, Eric Fried,
Takashi Natsume, Tetsuro Nakamura, Matt Riedemann, Andrey Volkov,
Alex Xu, Balazs Gibizer, Ed Leafe, and any other contributor to
placement whom I'm forgetting [1] would express their preference on
what they'd like to see happen.

At the same time, if people from neutron, cinder, blazar, zun,
mogan, ironic, and cyborg could express their preferences, we can get
through this by acclaim and get on with getting things done.


I am not opposed to extracting the placement service into its own repo. 
I also do not view it as a priority that should take precedence over the 
completion of other items, including the reshaper effort and the 
integration of placement calls into Nova (nested providers, sharing 
providers, etc).


The remaining items are Nova-centric. We need Nova-focused contributors 
to make placement more useful to Nova, and I fail to see how extracting 
the placement service will meet that goal. In fact, one might argue, as 
Melanie implies, that extracting placement outside of the Compute 
project would increase the velocity of the placement project *at the 
expense of* getting things done in the Nova project.


We've shown we can get many things done in placement. We've shown we can 
evolve the API fairly quickly. The velocity of the placement project 
isn't the problem. The problem is the lag between features being written 
into placement (sometimes too hastily IMHO) and actually *using* those 
features in Nova.


As for the argument about other projects being able (or being more 
willing to) use placement, I think that's not actually true. The 
projects that might want to ditch their own custom resource tracking and 
management code (Cyborg, Neutron, Cinder, Ironic) have either already 
done so or would require minimal changes to do that. There are no 
projects other than Ironic that I'm aware of that are interested in 
using the allocation candidates functionality (and the allocation claim 
process that entails) for the rough scheduling functionality that 
provides. I'm not sure placement being extracted would change that.


Would extracting placement out into its own repo result in a couple more 
people being added to the new placement core contributor team? Possibly. 
Will that result in Nova getting the integration pieces written that 
make use of placement? No, I don't believe so.


So, I'm on the fence. I understand the desire for separation, and I'm 
fully aware of my bias as a current Nova core contributor. I even 
support the process of extracting placement. But do I think it will do 
much other than provide some minor measure of independence? No, not really.


Consider me +0.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] podman: varlink interface for nice API calls

2018-08-15 Thread Jay Pipes

On 08/15/2018 04:01 PM, Emilien Macchi wrote:
On Wed, Aug 15, 2018 at 5:31 PM Emilien Macchi > wrote:


More seriously here: there is an ongoing effort to converge the
tools around containerization within Red Hat, and we, TripleO are
interested to continue the containerization of our services (which
was initially done with Docker & Docker-Distribution).
We're looking at how these containers could be managed by k8s one
day but way before that we plan to swap out Docker and join CRI-O
efforts, which seem to be using Podman + Buildah (among other things).

I guess my wording wasn't the best but Alex explained way better here:
http://eavesdrop.openstack.org/irclogs/%23openstack-tc/%23openstack-tc.2018-08-15.log.html#t2018-08-15T17:56:52

If I may have a chance to rephrase, I guess our current intention is to 
continue our containerization and investigate how we can improve our 
tooling to better orchestrate the containers.
We have a nice interface (openstack/paunch) that allows us to run 
multiple container backends, and we're currently looking outside of 
Docker to see how we could solve our current challenges with the new tools.
We're looking at CRI-O because it happens to be a project with a great 
community, focusing on some problems that we, TripleO have been facing 
since we containerized our services.


We're doing all of this in the open, so feel free to ask any question.


I appreciate your response, Emilien, thank you. Alex' responses to 
Jeremy on the #openstack-tc channel were informative, thank you Alex.


For now, it *seems* to me that all of the chosen tooling is very Red Hat 
centric. Which makes sense to me, considering Triple-O is a Red Hat product.


I don't know how much of the current reinvention of container runtimes 
and various tooling around containers is the result of politics. I don't 
know how much is the result of certain companies wanting to "own" the 
container stack from top to bottom. Or how much is a result of technical 
disagreements that simply cannot (or will not) be resolved among 
contributors in the container development ecosystem.


Or is it some combination of the above? I don't know.

What I *do* know is that the current "NIH du jour" mentality currently 
playing itself out in the container ecosystem -- reminding me very much 
of the Javascript ecosystem -- makes it difficult for any potential 
*consumers* of container libraries, runtimes or applications to be 
confident that any choice they make towards one of the other will be the 
*right* choice or even a *possible* choice next year -- or next week. 
Perhaps this is why things like openstack/paunch exist -- to give you 
options if something doesn't pan out.


You have a tough job. I wish you all the luck in the world in making 
these decisions and hope politics and internal corporate management 
decisions play as little a role in them as possible.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] podman: varlink interface for nice API calls

2018-08-15 Thread Jay Pipes

On 08/15/2018 05:32 AM, Cédric Jeanneret wrote:

Dear Community,

As you may know, a move toward Podman as replacement of Docker is starting.


This was news to me. Is this just a triple-o thing?

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][tooz][etcd] need help debugging tooz test failure

2018-08-13 Thread Jay Pipes

On 08/13/2018 03:52 PM, Doug Hellmann wrote:

Excerpts from Jay Pipes's message of 2018-08-13 13:21:56 -0400:

On 08/12/2018 04:11 PM, Doug Hellmann wrote:

The tooz tests on master and stable/rocky are failing with an error:

  UnicodeDecodeError: 'utf8' codec can't decode byte 0xc4 in position 0:
  invalid continuation byte

This is unrelated to the change, which is simply importing test job
settings or updating the .gitreview file. I need someone familiar with
the library to help debug the issue.

Can we get a volunteer?


Looking into it. Seems to be related to this upstream patch to
python-etcd3gw:

https://github.com/dims/etcd3-gateway/commit/224f40972b42c4ff16234c0e78ea765e3fe1af95

Best,
-jay



Thanks, Jay!

I see that Dims says he pushed a release. Is that something we need to
update in the constraints list, then?


Yeah, likely. We'll need to blacklist the 0.2.3 release of etcd3-gateway 
library in the openstack/tooz requirements file.


I think? :)

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][tooz][etcd] need help debugging tooz test failure

2018-08-13 Thread Jay Pipes

On 08/12/2018 04:11 PM, Doug Hellmann wrote:

The tooz tests on master and stable/rocky are failing with an error:

 UnicodeDecodeError: 'utf8' codec can't decode byte 0xc4 in position 0:
 invalid continuation byte

This is unrelated to the change, which is simply importing test job
settings or updating the .gitreview file. I need someone familiar with
the library to help debug the issue.

Can we get a volunteer?


Looking into it. Seems to be related to this upstream patch to 
python-etcd3gw:


https://github.com/dims/etcd3-gateway/commit/224f40972b42c4ff16234c0e78ea765e3fe1af95

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do we still want to lowercase metadata keys?

2018-08-13 Thread Jay Pipes

On 08/13/2018 11:56 AM, Chris Friesen wrote:

On 08/13/2018 08:26 AM, Jay Pipes wrote:

On 08/13/2018 10:10 AM, Matthew Booth wrote:



I suspect I've misunderstood, but I was arguing this is an anti-goal.
There's no reason to do this if the db is working correctly, and it
would violate the principal of least surprise in dbs with legacy
datasets (being all current dbs). These values have always been mixed
case, lets just leave them be and fix the db.


Do you want case-insensitive keys or do you not want case-insensitive 
keys?


It seems to me that people complain that MySQL is case-insensitive by 
default
but actually *like* the concept that a metadata key of "abc" should be 
"equal

to" a metadata key of "ABC".


How do we behave on PostgreSQL?  (I realize it's unsupported, but it 
still has users.)  It's case-sensitive by default, do we override that?


Personally, I've worked on case-sensitive systems long enough that I'd 
actually be surprised if "abc" matched "ABC". :)


You have worked with case-insensitive systems for as long or longer, 
maybe without realizing it: All URLs are case-insensitive.


If a user types in http://google.com they go to the same place as 
http://Google.com because DNS is case-insensitive [1] and has been since 
its beginning. Users -- of HTTP APIs in particular -- have tended to 
become accustomed to case-insensitivity in their HTTP API calls.


This case is no different, IMHO.

Best,
-jay

[1] https://tools.ietf.org/html/rfc4343#section-4

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do we still want to lowercase metadata keys?

2018-08-13 Thread Jay Pipes

On 08/13/2018 10:10 AM, Matthew Booth wrote:

On Mon, 13 Aug 2018 at 14:05, Jay Pipes  wrote:


On 08/13/2018 06:06 AM, Matthew Booth wrote:

Thanks mriedem for answering my previous question, and also pointing
out the related previous spec around just forcing all metadata to be
lowercase:

(Spec: approved in Newton) https://review.openstack.org/#/c/311529/
(Online migration: not merged, abandoned)
https://review.openstack.org/#/c/329737/

There are other code patches, but the above is representative. What I
had read was the original bug:

https://bugs.launchpad.net/nova/+bug/1538011

The tl;dr is that the default collation used by MySQL results in a bug
when creating 2 metadata keys which differ only in case. The proposal
was obviously to simply make all metadata keys lower case. However, as
melwitt pointed out in the bug at the time that's a potentially user
hostile change. After some lost IRC discussion it seems that folks
believed at the time that to fix this properly would seriously
compromise the performance of these queries. The agreed way forward
was to allow existing keys to keep their case, but force new keys to
be lower case (so I wonder how the above online migration came
about?).

Anyway, as Rajesh's patch shows, it's actually very easy just to fix
the MySQL misconfiguration:

https://review.openstack.org/#/c/504885/

So my question is, given that the previous series remains potentially
user hostile, the fix isn't as complex as previously believed, and it
doesn't involve a performance penalty, are there any other reasons why
we might want to resurrect it rather than just go with Rajesh's patch?
Or should we ask Rajesh to expand his patch into a series covering
other metadata?


Keep in mind this patch is only related to *aggregate* metadata, AFAICT.


Right, but the original bug pointed out that the same problem applies
equally to a bunch of different metadata stores. I haven't verified,
but the provenance was good ;) There would have to be other patches
for the other metadata stores.


Yes, it is quite unfortunate that OpenStack has about 15 different ways 
of storing metadata key/value information.




Any patch series that tries to "fix" this issue needs to include all of
the following:

* input automatically lower-cased [1]
* inline (note: not online, inline) data migration inside the
InstanceMeta object's _from_db_object() method for existing
non-lowercased keys


I suspect I've misunderstood, but I was arguing this is an anti-goal.
There's no reason to do this if the db is working correctly, and it
would violate the principal of least surprise in dbs with legacy
datasets (being all current dbs). These values have always been mixed
case, lets just leave them be and fix the db.


Do you want case-insensitive keys or do you not want case-insensitive keys?

It seems to me that people complain that MySQL is case-insensitive by 
default but actually *like* the concept that a metadata key of "abc" 
should be "equal to" a metadata key of "ABC".


In other words, it seems to me that users actually expect that:

> nova aggregate-create agg1
> nova aggregate-set-metadata agg1 abc=1
> nova aggregate-set-metadata agg1 ABC=2

should result in the original "abc" metadata item getting its value set 
to "2".


If that isn't the case -- and I have a very different impression of what 
users *actually* expect from the CLI/UI -- then let me know.


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do we still want to lowercase metadata keys?

2018-08-13 Thread Jay Pipes

On 08/13/2018 06:06 AM, Matthew Booth wrote:

Thanks mriedem for answering my previous question, and also pointing
out the related previous spec around just forcing all metadata to be
lowercase:

(Spec: approved in Newton) https://review.openstack.org/#/c/311529/
(Online migration: not merged, abandoned)
https://review.openstack.org/#/c/329737/

There are other code patches, but the above is representative. What I
had read was the original bug:

https://bugs.launchpad.net/nova/+bug/1538011

The tl;dr is that the default collation used by MySQL results in a bug
when creating 2 metadata keys which differ only in case. The proposal
was obviously to simply make all metadata keys lower case. However, as
melwitt pointed out in the bug at the time that's a potentially user
hostile change. After some lost IRC discussion it seems that folks
believed at the time that to fix this properly would seriously
compromise the performance of these queries. The agreed way forward
was to allow existing keys to keep their case, but force new keys to
be lower case (so I wonder how the above online migration came
about?).

Anyway, as Rajesh's patch shows, it's actually very easy just to fix
the MySQL misconfiguration:

https://review.openstack.org/#/c/504885/

So my question is, given that the previous series remains potentially
user hostile, the fix isn't as complex as previously believed, and it
doesn't involve a performance penalty, are there any other reasons why
we might want to resurrect it rather than just go with Rajesh's patch?
Or should we ask Rajesh to expand his patch into a series covering
other metadata?


Keep in mind this patch is only related to *aggregate* metadata, AFAICT.

Any patch series that tries to "fix" this issue needs to include all of 
the following:


* input automatically lower-cased [1]
* inline (note: not online, inline) data migration inside the 
InstanceMeta object's _from_db_object() method for existing 
non-lowercased keys
* change the collation of the aggregate_metadata.key column (note: this 
will require an entire rebuild of the table, since this column is part 
of a unique constraint [3]
* online data migration for migrating non-lowercased keys to their 
lowercased counterpars (essentially doing `UPDATE key = LOWER(key) WHERE 
LOWER(key) != key` once the collation has been changed)


None of the above touches the API layer. I suppose some might argue that 
the REST API should be microversion-bumped since the expected behaviour 
of the API will change (data will be transparently changed in one 
version of the API and not another). I don't personally think that's 
something I would require a microversion for, but who knows what others 
may say.


Best,
-jay

[1] 
https://github.com/openstack/nova/blob/16f89fd093217d22530570e8277b561ea79f46ff/nova/objects/aggregate.py#L295 
and 
https://github.com/openstack/nova/blob/16f89fd093217d22530570e8277b561ea79f46ff/nova/objects/aggregate.py#L331 
and 
https://github.com/openstack/nova/blob/16f89fd093217d22530570e8277b561ea79f46ff/nova/objects/aggregate.py#L356 



[2] 
https://github.com/openstack/nova/blob/16f89fd093217d22530570e8277b561ea79f46ff/nova/objects/aggregate.py#L248


[3] 
https://github.com/openstack/nova/blob/16f89fd093217d22530570e8277b561ea79f46ff/nova/db/sqlalchemy/api_models.py#L64


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-09 Thread Jay Pipes
On Wed, Aug 1, 2018 at 11:15 AM, Ben Nemec  wrote:

> Hi,
>
> I'm having an issue with no valid host errors when starting instances and
> I'm struggling to figure out why.  I thought the problem was disk space,
> but I changed the disk_allocation_ratio and I'm still getting no valid
> host.  The host does have plenty of disk space free, so that shouldn't be a
> problem.
>
> However, I'm not even sure it's disk that's causing the failures because I
> can't find any information in the logs about why the no valid host is
> happening.  All I get from the scheduler is:
>
> "Got no allocation candidates from the Placement API. This may be a
> temporary occurrence as compute nodes start up and begin reporting
> inventory to the Placement service."
>
> While in placement I see:
>
> 2018-08-01 15:02:22.062 20 DEBUG nova.api.openstack.placement.requestlog
> [req-0a830ce9-e2af-413a-86cb-b47ae129b676 fc44fe5cefef43f4b921b9123c95e694
> b07e6dc2e6284b00ac7070aa3457c15e - default default] Starting request:
> 10.2.2.201 "GET /placement/allocation_candidat
> es?limit=1000=DISK_GB%3A20%2CMEMORY_MB%3A2048%2CVCPU%3A1"
> __call__ /usr/lib/python2.7/site-packages/nova/api/openstack/placemen
> t/requestlog.py:38
> 2018-08-01 15:02:22.103 20 INFO nova.api.openstack.placement.requestlog
> [req-0a830ce9-e2af-413a-86cb-b47ae129b676 fc44fe5cefef43f4b921b9123c95e694
> b07e6dc2e6284b00ac7070aa3457c15e - default default] 10.2.2.201 "GET
> /placement/allocation_candidates?limit=1000=DISK_
> GB%3A20%2CMEMORY_MB%3A2048%2CVCPU%3A1" status: 200 len: 53 microversion:
> 1.25
>
> Basically it just seems to be logging that it got a request, but there's
> no information about what it did with that request.
>
> So where do I go from here?  Is there somewhere else I can look to see why
> placement returned no candidates?
>
>
Hi again, Ben, hope you are enjoying your well-earned time off! :)

I've created a patch that (hopefully) will address some of the difficulty
that folks have had in diagnosing which parts of a request caused all
providers to be filtered out from the return of GET /allocation_candidates:

https://review.openstack.org/#/c/590041

This patch changes two primary things:

1) Query-splitting

The patch splits the existing monster SQL query that was being used for
querying for all providers that matched all requested resources, required
traits, forbidden traits and required aggregate associations into doing
multiple queries, one for each requested resource. While this does increase
the number of database queries executed for each call to GET
/allocation_candidates, the changes allow better visibility into what parts
of the request cause an exhaustion of matching providers. We've benchmarked
the new patch and have shown the performance impact of doing 3 queries
versus 1 (when there is a request for 3 resources -- VCPU, RAM and disk) is
minimal (a few extra milliseconds for execution against a DB with 1K
providers having inventory of all three resource classes).

2) Diagnostic logging output

The patch adds debug log output within each loop iteration, so there is no
logging output that shows how many matching providers were found for each
resource class involved in the request. The output looks like this in the
logs:

[req-2d30faa8-4190-4490-a91e-610045530140] inside VCPU request loop.
before applying trait and aggregate filters, found 12 matching
providers[req-2d30faa8-4190-4490-a91e-610045530140] found 12 providers
with capacity for the requested 1
VCPU.[req-2d30faa8-4190-4490-a91e-610045530140] inside MEMORY_MB
request loop. before applying trait and aggregate filters, found 9
matching providers[req-2d30faa8-4190-4490-a91e-610045530140] found 9
providers with capacity for the requested 64 MEMORY_MB. before loop
iteration we had 12 matches.[req-2d30faa8-4190-4490-a91e-610045530140]
RequestGroup(use_same_provider=False, resources={MEMORY_MB:64,
VCPU:1}, traits=[], aggregates=[]) (suffix '') returned 9 matches

If a request includes required traits, forbidden traits or required
aggregate associations, there are additional log messages showing how many
matching providers were found after applying the trait or aggregate
filtering set operation (in other words, the log output shows the impact of
the trait filter or aggregate filter in much the same way that the existing
FilterScheduler logging shows the "before and after" impact that a
particular filter had on a request process.

Have a look at the patch in question and please feel free to add your
feedback and comments on ways this can be improved to meet your needs.

Best,
-jay
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][placement] Excessive WARNING level log messages in placement-api

2018-08-08 Thread Jay Pipes

For evidence, see:

http://logs.openstack.org/41/590041/1/check/tempest-full-py3/db08dec/controller/logs/screen-placement-api.txt.gz?level=WARNING

thousands of these are filling the logs with WARNING-level log messages, 
making it difficult to find anything:


Aug 08 22:17:30.837557 ubuntu-xenial-inap-mtl01-0001226060 
devstack@placement-api.service[14403]: WARNING py.warnings 
[req-a809b022-59af-4628-be73-488cfec3187d 
req-d46cb1f0-431f-490f-955b-b9c2cd9f6437 service placement] 
/usr/local/lib/python3.5/dist-packages/oslo_policy/policy.py:896: 
UserWarning: Policy placement:resource_providers:list failed scope 
check. The token used to make the request was project scoped but the 
policy requires ['system'] scope. This behavior may change in the future 
where using the intended scope is required
Aug 08 22:17:30.837800 ubuntu-xenial-inap-mtl01-0001226060 
devstack@placement-api.service[14403]:   warnings.warn(msg)
Aug 08 22:17:30.838067 ubuntu-xenial-inap-mtl01-0001226060 
devstack@placement-api.service[14403]:


Is there any way we can get rid of these?

Thanks,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Stepping down as coordinator for the Outreachy internships

2018-08-08 Thread Jay Pipes

On 08/08/2018 09:32 AM, Doug Hellmann wrote:

Excerpts from Victoria Martínez de la Cruz's message of 2018-08-07 20:47:28 
-0300:

Hi all,

I'm reaching you out to let you know that I'll be stepping down as
coordinator for OpenStack next round. I had been contributing to this
effort for several rounds now and I believe is a good moment for somebody
else to take the lead. You all know how important is Outreachy to me and
I'm grateful for all the amazing things I've done as part of the Outreachy
program and all the great people I've met in the way. I plan to keep
involved with the internships but leave the coordination tasks to somebody
else.

If you are interested in becoming an Outreachy coordinator, let me know and
I can share my experience and provide some guidance.

Thanks,

Victoria


Thank you, Victoria. Mentoring new developers is an important
responsibility, and your patient service in working with the Outreachy
program has set a good example.


Big +1.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-06 Thread Jay Pipes

On 08/04/2018 07:35 PM, Michael Glasgow wrote:

On 8/2/2018 7:27 PM, Jay Pipes wrote:
It's not an exception. It's normal course of events. NoValidHosts 
means there were no compute nodes that met the requested resource 
amounts.


To clarify, I didn't mean a python exception.


Neither did I. I was referring to exceptional behaviour, not a Python 
exception.



I concede that I should've chosen a better word for the type of
object I have in mind.

If a SELECT statement against an Oracle DB returns 0 rows, is that an 
exception? No. Would an operator need to re-send the SELECT statement 
with an EXPLAIN SELECT in order to get information about what indexes 
were used to winnow the result set (to zero)? Yes. Either that, or the 
operator would need to gradually re-execute smaller SELECT statements 
containing fewer filters in order to determine which join or predicate 
caused a result set to contain zero rows.


I'm not sure if this analogy fully appreciates the perspective of the 
operator.  You're correct of course that if you select on a db and the 
correct answer is zero rows, then zero rows is the right answer, 100% of 
the time.


Whereas what I thought we meant when we talk about "debugging no valid 
host failures" is that zero rows is *not* the right answer, and yet 
you're getting zero rows anyway.


No, "debugging no valid host failures" doesn't mean that zero rows is 
the wrong answer. It means "find out why Nova thinks there's nowhere 
that my instance will fit".



So yes, absolutely with an Oracle DB you would get an ORA-X
exception in that case, along with a trace file that told you where
things went off the rails.  Which is exactly what we don't have
here.
That is precisely the opposite of what I was saying. Again, getting no 
results is *not* an error. It's normal behaviour and indicates there 
were no compute hosts that met the requirements of the request. This is 
not an error or exceptional behaviour. It's simply the result of a query 
against the placement database.


If you get zero rows returned, that means you need to determine what 
part of your request caused the winnowed result set to go from >0 rows 
to 0 rows.


And what we've been discussing is exactly the process by which such an 
investigation could be done. There are two options: do the investigation 
*inline* as part of the original request or do it *offline* after the 
original request returns 0 rows.


Doing it inline means splitting the large query we currently construct 
into multiple queries (for each related group of requested resources 
and/or traits) and logging the number of results grabbed for each of 
those queries.


Doing if offline means developing some diagnostic tool that an operator 
could run (similar to what Windriver did with [1]). The issue with that 
is that the diagnostic tool can only represent the resource usage at the 
time the diagnostic tool was run, not when the original request that 
returned 0 rows ran.


[1] 
https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-94f87e728df6465becce5241f3da53c8R330


If I understand your perspective correctly, it's basically that 
placement is working as designed, so there's nothing more to do except 
pore over debug output.  Can we consider:


  (1) that might not always be true if there are bugs


Bugs in the placement service are an entirely separate issue. They do 
occur, of course, but we're not talking about that here.


  (2) even when it is technically true, from the user's perspective, I'd 
posit that it's rare that a user requests an instance with the express 
intent of not launching an instance. (?)  If they're "debugging" this 
issue, it means there's a misconfiguration or some unexpected state that 
they have to go find.


Depends on what you have in mind as a "user". If I launch an instance in 
an AWS region, I'd be very surprised if the service told me there was 
nowhere to place my instance unless of course I'd asked it to launch an 
instance with requirements that exceeded AWS' ability to launch.


If you're talking about a user of a private IT cloud with a single rack 
of compute hosts, that user might very well expect to see a return of 
"sorry mate, there's nowhere to put your request right now.".


There is no explicit or implicit SLA or guarantee that Nova needs to 
somehow create a place to put an instance when no such place exists to 
put the instance.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Jay Pipes

On 08/02/2018 06:18 PM, Michael Glasgow wrote:

On 08/02/18 15:04, Chris Friesen wrote:

On 08/02/2018 01:04 PM, melanie witt wrote:


The problem is an infamous one, which is, your users are trying to boot
instances and they get "No Valid Host" and an instance in ERROR 
state. They contact support, and now support is trying to determine 
why NoValidHost happened. In the past, they would turn on DEBUG log 
level on the nova-scheduler, try another request, and take a look at 
the scheduler logs.


At a previous Summit[1] there were some operators that said they just 
always ran nova-scheduler with debug logging enabled in order to deal 
with this issue, but that it was a pain [...]


I would go a bit further and say it's likely to be unacceptable on a 
large cluster.  It's expensive to deal with all those logs and to 
manually comb through them for troubleshooting this issue type, which 
can happen frequently with some setups.  Secondarily there are 
performance and security concerns with leaving debug on all the time.


As to "defining the problem", I think it's what Melanie said.  It's 
about asking for X and the system saying, "sorry, can't give you X" with 
no further detail or even means of discovering it.


More generally, any time a service fails to deliver a resource which it 
is primarily designed to deliver, it seems to me at this stage that 
should probably be taken a bit more seriously than just "check the log 
file, maybe there's something in there?"  From the user's perspective, 
if nova fails to produce an instance, or cinder fails to produce a 
volume, or neutron fails to build a subnet, that's kind of a big deal, 
right?


In such cases, would it be possible to generate a detailed exception 
object which contains all the necessary info to ascertain why that 
specific failure occurred?


It's not an exception. It's normal course of events. NoValidHosts means 
there were no compute nodes that met the requested resource amounts.


There's plenty of ways the operator can get usage and trait information 
and determine if there are providers that meet the requested amounts and 
required/forbidden traits.


What we're talking about here is debugging information, plain and simple.

If a SELECT statement against an Oracle DB returns 0 rows, is that an 
exception? No. Would an operator need to re-send the SELECT statement 
with an EXPLAIN SELECT in order to get information about what indexes 
were used to winnow the result set (to zero)? Yes. Either that, or the 
operator would need to gradually re-execute smaller SELECT statements 
containing fewer filters in order to determine which join or predicate 
caused a result set to contain zero rows.


That's exactly what we're talking about here. It's not an exception. 
It's debugging information.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Jay Pipes

On 08/02/2018 01:40 PM, Eric Fried wrote:

Jay et al-


And what I'm referring to is doing a single query per "related
resource/trait placement request group" -- which is pretty much what
we're heading towards anyway.

If we had a request for:

GET /allocation_candidates?
  resources0=VCPU:1&
  required0=HW_CPU_X86_AVX2,!HW_CPU_X86_VMX&
  resources1=MEMORY_MB:1024

and logged something like this:

DEBUG: [placement request ID XXX] request group 1 of 2 for 1 PCPU,
requiring HW_CPU_X86_AVX2, forbidding HW_CPU_X86_VMX, returned 10 matches

DEBUG: [placement request ID XXX] request group 2 of 2 for 1024
MEMORY_MB returned 3 matches

that would at least go a step towards being more friendly for debugging
a particular request's results.


Well, that's easy [1] (but I'm sure you knew that when you suggested
it). Produces logs like [2].

This won't be backportable, I'm afraid.

[1] https://review.openstack.org/#/c/588350/
[2] http://paste.openstack.org/raw/727165/


Yes.

And we could do the same kind of approach with the non-granular request 
groups by reducing the single large SQL statement that is used for all 
resources and all traits (and all agg associations) into separate SELECT 
statements.


It could be slightly less performance-optimized but more readable and 
easier to output debug logs like those above.


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Jay Pipes

On 08/02/2018 01:12 AM, Alex Xu wrote:
2018-08-02 4:09 GMT+08:00 Jay Pipes <mailto:jaypi...@gmail.com>>:


On 08/01/2018 02:02 PM, Chris Friesen wrote:

On 08/01/2018 11:32 AM, melanie witt wrote:

I think it's definitely a significant issue that
troubleshooting "No allocation
candidates returned" from placement is so difficult.
However, it's not
straightforward to log detail in placement when the request
for allocation
candidates is essentially "SELECT * FROM nodes WHERE cpu
usage < needed and disk
usage < needed and memory usage < needed" and the result is
returned from the API.


I think the only way to get useful info on a failure would be to
break down the huge SQL statement into subclauses and store the
results of the intermediate queries.


This is a good idea and something that can be done.


That sounds like you need separate sql query for each resource to get 
the intermediate, will that be terrible performance than a single query 
to get the final result?


No, not necessarily.

And what I'm referring to is doing a single query per "related 
resource/trait placement request group" -- which is pretty much what 
we're heading towards anyway.


If we had a request for:

GET /allocation_candidates?
 resources0=VCPU:1&
 required0=HW_CPU_X86_AVX2,!HW_CPU_X86_VMX&
 resources1=MEMORY_MB:1024

and logged something like this:

DEBUG: [placement request ID XXX] request group 1 of 2 for 1 PCPU, 
requiring HW_CPU_X86_AVX2, forbidding HW_CPU_X86_VMX, returned 10 matches


DEBUG: [placement request ID XXX] request group 2 of 2 for 1024 
MEMORY_MB returned 3 matches


that would at least go a step towards being more friendly for debugging 
a particular request's results.


-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Jay Pipes

On 08/01/2018 02:02 PM, Chris Friesen wrote:

On 08/01/2018 11:32 AM, melanie witt wrote:

I think it's definitely a significant issue that troubleshooting "No 
allocation

candidates returned" from placement is so difficult. However, it's not
straightforward to log detail in placement when the request for 
allocation
candidates is essentially "SELECT * FROM nodes WHERE cpu usage < 
needed and disk
usage < needed and memory usage < needed" and the result is returned 
from the API.


I think the only way to get useful info on a failure would be to break 
down the huge SQL statement into subclauses and store the results of the 
intermediate queries.


This is a good idea and something that can be done.

Unfortunately, it's refactoring work and as a community, we tend to 
prioritize fancy features like NUMA topology and CPU pinning over 
refactoring work.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] compute nodes use of placement

2018-07-30 Thread Jay Pipes

ack. will review shortly. thanks, Chris.

On 07/30/2018 02:20 PM, Chris Dent wrote:

On Mon, 30 Jul 2018, Jay Pipes wrote:


On 07/26/2018 12:15 PM, Chris Dent wrote:

The `in_tree` calls happen from the report client method
`_get_providers_in_tree` which is called by
`_ensure_resource_provider` which can be called from multiple
places, but in this case is being called both times from
`get_provider_tree_and_ensure_root`, which is also responsible for
two of the inventory request.

`get_provider_tree_and_ensure_root` is called by `_update` in the
resource tracker.

`_update` is called by both `_init_compute_node` and
`_update_available_resource`. Every single period job iteration.
`_init_compute_node` is called from _update_available_resource`
itself.

That accounts for the overall doubling.


Actually, no. What accounts for the overall doubling is the fact that 
we no longer short-circuit return from _update() when there are no 
known changes in the node's resources.


I think we're basically agreeing on this: I'm describing the current
state of affairs, not attempting to describe why it is that way.
Your insight helps to explain why.

I have a set of change in progress which experiments with what
happens if we don't call placement a second time in the _update
call:

   https://review.openstack.org/#/c/587050/

Just to see what might blow up.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] compute nodes use of placement

2018-07-30 Thread Jay Pipes

On 07/26/2018 12:15 PM, Chris Dent wrote:

The `in_tree` calls happen from the report client method
`_get_providers_in_tree` which is called by
`_ensure_resource_provider` which can be called from multiple
places, but in this case is being called both times from
`get_provider_tree_and_ensure_root`, which is also responsible for
two of the inventory request.

`get_provider_tree_and_ensure_root` is called by `_update` in the
resource tracker.

`_update` is called by both `_init_compute_node` and
`_update_available_resource`. Every single period job iteration.
`_init_compute_node` is called from _update_available_resource`
itself.

That accounts for the overall doubling.


Actually, no. What accounts for the overall doubling is the fact that we 
no longer short-circuit return from _update() when there are no known 
changes in the node's resources.


We *used* to do a quick check of whether the resource tracker's local 
cache of resources had been changed, and just exit _update() if no 
changes were detected. However, this patch modified that so that we 
*always* call to get inventory, even if the resource tracker noticed no 
changes in resources:


https://github.com/openstack/nova/commit/e2a18a37190e4c7b7697a8811553d331e208182c

The reason for that change is because the virt driver was tracking vGPU 
resources now and those vGPU resources were not tracked by the resource 
tracker's local cache of resources.


Thus, we now always call the virt driver get_inventory() call (which 
morphed into the virt driver's update_provider_tree() call, but the 
change to update_provider_tree() didn't actually increase the number of 
calls to get inventories. It was the patch above that did that.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] keypair quota usage info for user

2018-07-27 Thread Jay Pipes

On 07/27/2018 03:21 PM, Matt Riedemann wrote:

On 7/27/2018 2:14 PM, Matt Riedemann wrote:
 From checking the history and review discussion on [3], it seems 
that it was like that from staring. key_pair quota is being counted 
when actually creating the keypair but it is not shown in API 
'in_use' field.


Just so I'm clear which API we're talking about, you mean there is no 
totalKeypairsUsed entry in 
https://developer.openstack.org/api-ref/compute/#show-rate-and-absolute-limits 
correct?


Nevermind I see it now:

https://developer.openstack.org/api-ref/compute/#show-the-detail-of-quota

We have too many quota-related APIs.


Yes. Yes we do.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] OpenStack lagging behind 2 major python versions: we need a Python 3.7 gate

2018-07-18 Thread Jay Pipes

On 07/18/2018 12:42 AM, Ian Wienand wrote:

The ideal is that a (say) Neutron dev gets a clear traceback from a
standard Python error in their change and happily fixes it.  The
reality is probably more like this developer gets a tempest
failure due to nova failing to boot a cirros image, stemming from a
detached volume due to a qemu bug that manifests due to a libvirt
update (I'm exaggerating, I know :).


Not really exaggerating. :)

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] How to look up a project name from Neutron server code?

2018-07-17 Thread Jay Pipes

On 07/17/2018 03:36 AM, Neil Jerram wrote:
Can someone help me with how to look up a project name (aka tenant name) 
for a known project/tenant ID, from code (specifically a mechanism 
driver) running in the Neutron server?


I believe that means I need to make a GET REST call as here: 
https://developer.openstack.org/api-ref/identity/v3/index.html#projects.  But 
I don't yet understand how a piece of Neutron server code can ensure 
that it has the right credentials to do that.  If someone happens to 
have actual code for doing this, I'm sure that would be very helpful.


(I'm aware that whenever the Neutron server processes an API request, 
the project name for the project that generated that request is added 
into the request context.  That is great when my code is running in an 
API request context.  But there are other times when the code isn't in a 
request context and still needs to map from a project ID to project 
name; hence the question here.)


Hi Neil,

You basically answered your own question above :) The neutron request 
context gets built from oslo.context's Context.from_environ() [1] which 
has this note in the implementation [2]:


# Load a new context object from the environment variables set by
# auth_token middleware. See:
# 
https://docs.openstack.org/keystonemiddleware/latest/api/keystonemiddleware.auth_token.html#what-auth-token-adds-to-the-request-for-use-by-the-openstack-service


So, basically, simply look at the HTTP headers for HTTP_X_PROJECT_NAME. 
If you don't have access to a HTTP headers, then you'll need to pass 
some context object/struct to the code you're referring to. Might as 
well pass the neutron RequestContext (derived from oslo_context.Context) 
to the code you're referring to and you get all this for free.


Best,
-jay

[1] 
https://github.com/openstack/oslo.context/blob/4abd5377e4d847102a4e87a528d689e31cc1713c/oslo_context/context.py#L424


[2] 
https://github.com/openstack/oslo.context/blob/4abd5377e4d847102a4e87a528d689e31cc1713c/oslo_context/context.py#L433-L435


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] about block device driver

2018-07-16 Thread Jay Pipes

On 07/16/2018 10:15 AM, arkady.kanev...@dell.com wrote:

Is this for ephemeral storage handling?


For both ephemeral as well as root disk.

In other words, just act like Cinder isn't there and attach a big local 
root disk to the instance.


Best,
-jay


-Original Message-
From: Jay Pipes [mailto:jaypi...@gmail.com]
Sent: Monday, July 16, 2018 8:44 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [cinder] about block device driver

On 07/16/2018 09:32 AM, Sean McGinnis wrote:

The other option would be to not use Cinder volumes so you just use
local storage on your compute nodes.


^^ yes, this.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [placement] Low hanging fruit bug for interested newcomers

2018-07-16 Thread Jay Pipes

Hi all,

Here's a testing and documentation bug that would be great for newcomers 
to the placement project:


https://bugs.launchpad.net/nova/+bug/1781439

Come find us on #openstack-placement on Freenode IRC to chat about it if 
you're interested!


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] about block device driver

2018-07-16 Thread Jay Pipes

On 07/16/2018 09:32 AM, Sean McGinnis wrote:

The other option would be to not use Cinder volumes so you just use local
storage on your compute nodes.


^^ yes, this.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][placement] placement update 18-28

2018-07-16 Thread Jay Pipes
This is placement update 18-28, a weekly update of ongoing development 
related to the [OpenStack](https://www.openstack.org/) [placement 
service](https://developer.openstack.org/api-ref/placement/).


This week I'm trying to fill Chris' esteemable shoes while he's away.

# Most Important

## Reshape Provider Trees

Code series: 



There are at least four different contributors working on various parts 
of the "reshape provider trees" spec implementation 
. Three of the four were 
blocked on work I was supposed to complete around the single DB 
transaction for modifying inventory and allocation records atomically. 
So, this was the focus of work for week 28, in order to unblock other 
contributors.


Work on the primary patch in the series 
 is ongoing, with excellent 
feedback and code additions from Eric Fried, Tetsuro Nakamura, Balasz 
Gibizer and Chris Dent. We hope to have this code merged in the next day 
or two.


There are WIPs for the HTTP parts and the resource tracker parts, on 
that topic, but both of those are dependent on the DB work merging.


# Important bug categories

In week 27 we discovered a set of bugs related to consumers and the 
handling of consumer generations. Most of these have now been fixed. 
Here is a list of these bugs along with their status:


* No ability to update consumer's project/user external ID

FIX RELEASED
* Possible race updating consumer's project/user

NEW
* default missing project/user in placement is invalid UUID

FIX RELEASED
* Consumers with no allocations should be auto-deleted

FIX RELEASED
* Auto-created consumer record not clean up after fail allocation

FIX RELEASED
* Making new allocation for one consumer and multiple providers gives 
409 Conflict


FIX RELEASED
* AllocationList.delete_all() incorrectly assumes a single consumer

IN PROGRESS
* Consumers never get deleted

FIX RELEASED
* ensure-consumer gabbi test uses invalid consumer id

IN PROGESS
* return 404 when no consumer found in allocs

IN PROGRESS (lower priority now that consumers
with no allocations are auto-deleted)

**DECISION MADE**: The team made a decision to automatically remove any 
consumer record when there were no more allocations for that consumer. 
Remember that for the Nova use case, a consumer is either an instance or 
an on-going migration. So, when an instance is terminated, the consumer 
record that stores attributes about the instance -- such as the project 
and user IDs -- is now removed.


The other area of bugginess that was uncovered in week 27 and addressed 
in week 18 was related to various ways in which managing parents of 
nested providers was incorrect. Those were:


* placement allows RP parent loop in PUT resource_providers/{uuid}

FIX RELEASED
* Child's root provider is not updated

FIX RELEASED

Both of those, as you can see, have been fixed.

# Bugs

* Placement related [bugs not yet in progress](https://goo.gl/TgiPXb): 
15, -1 on last week.

* [In progress placement bugs](https://goo.gl/vzGGDQ) 14, -3 on last week.

# Other

The following continue to remain from the previous week and are copied 
verbatim from Chris' week 27 update.


* 
  Purge comp_node and res_prvdr records during deletion of
  cells/hosts

* 
  Get resource provider by uuid or name (osc-placement)

* 
  Tighten up ReportClient use of generation

* 
  Add unit test for non-placement resize

* 
  Move refresh time from report client to prov tree

* 
  PCPU resource class

* 
  rework how we pass candidate request information

* 
  add root parent NULL online migration

* 
  add resource_requests field to RequestSpec

* 
  Convert driver supported capabilities to compute node provider
  traits

* 
  Use placement.inventory.inuse in report client

* 
  ironic: Report resources as reserved 

Re: [openstack-dev] [nova] What do we lose if the reshaper stuff doesn't land in Rocky?

2018-07-12 Thread Jay Pipes

DB work is now pushed for the single transaction reshape() function:

https://review.openstack.org/#/c/582383

Note that in working on that, I uncovered a bug in 
AllocationList.delete_all() which needed to first be fixed:


https://bugs.launchpad.net/nova/+bug/1781430

A fix has been pushed here:

https://review.openstack.org/#/c/582382/

Best,
-jay

On 07/12/2018 10:45 AM, Matt Riedemann wrote:
Continuing the discussion from the nova meeting today [1], I'm trying to 
figure out what the risk / benefit / contingency is if we don't get the 
reshaper stuff done in Rocky.


In a nutshell, we need reshaper to migrate VGPU inventory for the 
libvirt and xenapi drivers from the root compute node resource provider 
to child providers in the compute node provider tree, because then we 
can support multiple VGPU type inventory on the same compute host. [2]


Looking at the status of the vgpu-rocky blueprint [3], the libvirt 
changes are in merge conflict but the xenapi changes are ready to go.


What I'm wondering is if we don't get reshaper done in Rocky, what does 
that prevent us from doing in Stein? For example, does it mean we can't 
support modeling NUMA in placement until the T release? Or does it just 
mean that we lose the upgrade window from Rocky to Stein such that we 
expect people to run the reshaper migration so that Stein code can 
assume the migration has been done and model nested resource providers?


If the former (no NUMA modeling until T), that's a big deal. If the 
latter, it makes the Stein code more complicated but it doesn't sound 
impossible, right? Wouldn't the Stein code just need to add some 
checking to see if the migration has been done before it can support 
some new features?


Obviously if we don't have reshaper done in Rocky then the xenapi driver 
can't support multiple VGPU types on the same compute host in Rocky - 
but isn't that kind of the exact same situation if we don't get reshaper 
done until Stein?


[1] 
http://eavesdrop.openstack.org/meetings/nova/2018/nova.2018-07-12-14.00.log.html#l-71 

[2] 
https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/vgpu-rocky.html 

[3] 
https://review.openstack.org/#/q/topic:bp/vgpu-rocky+(status:open+OR+status:merged) 





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] What do we lose if the reshaper stuff doesn't land in Rocky?

2018-07-12 Thread Jay Pipes
Let's just get the darn thing done in Rocky. I will have the DB work up 
for review today.


-jay

On 07/12/2018 10:45 AM, Matt Riedemann wrote:
Continuing the discussion from the nova meeting today [1], I'm trying to 
figure out what the risk / benefit / contingency is if we don't get the 
reshaper stuff done in Rocky.


In a nutshell, we need reshaper to migrate VGPU inventory for the 
libvirt and xenapi drivers from the root compute node resource provider 
to child providers in the compute node provider tree, because then we 
can support multiple VGPU type inventory on the same compute host. [2]


Looking at the status of the vgpu-rocky blueprint [3], the libvirt 
changes are in merge conflict but the xenapi changes are ready to go.


What I'm wondering is if we don't get reshaper done in Rocky, what does 
that prevent us from doing in Stein? For example, does it mean we can't 
support modeling NUMA in placement until the T release? Or does it just 
mean that we lose the upgrade window from Rocky to Stein such that we 
expect people to run the reshaper migration so that Stein code can 
assume the migration has been done and model nested resource providers?


If the former (no NUMA modeling until T), that's a big deal. If the 
latter, it makes the Stein code more complicated but it doesn't sound 
impossible, right? Wouldn't the Stein code just need to add some 
checking to see if the migration has been done before it can support 
some new features?


Obviously if we don't have reshaper done in Rocky then the xenapi driver 
can't support multiple VGPU types on the same compute host in Rocky - 
but isn't that kind of the exact same situation if we don't get reshaper 
done until Stein?


[1] 
http://eavesdrop.openstack.org/meetings/nova/2018/nova.2018-07-12-14.00.log.html#l-71 

[2] 
https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/vgpu-rocky.html 

[3] 
https://review.openstack.org/#/q/topic:bp/vgpu-rocky+(status:open+OR+status:merged) 





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] placement update 18-27

2018-07-10 Thread Jay Pipes

On 07/06/2018 10:09 AM, Chris Dent wrote:

# Questions

* Will consumer id, project and user id always be a UUID? We've
   established for certain that user id will not, but things are
   less clear for the other two. This issue is compounded by the
   fact that these two strings are different but the same UUID:
   5eb033fd-c550-420e-a31c-3ec2703a403c,
   5eb033fdc550420ea31c3ec2703a403c (bug 1758057 mentioned above) but
   we treat them differently in our code.


As mentioned by a couple people on IRC, a consumer's external project 
identifier and external user identifier come directly from Keystone. 
Since Keystone has no rule about these values being UUIDs or even 
UUID-like, we clearly cannot treat them as UUIDs in the placement service.


Our backend data storage for these attributes is suitably a String(255) 
column and there is no validation done on these values. In fact, the 
project and user external identifiers are taken directly from the 
nova.context WSGI environ when sending from the placement client [1].


So, really, the only thing we're discussing is whether consumer_id is 
always a UUID.


I believe it should be, and the fact that it's referred to as 
consumer_uuid in so many places should be indicative of its purpose. I 
know originally the field in the DB was a String(64), but it's since 
been changed to String(36), further evidence that consumer_id was 
intended to be a UUID.


I believe we should validate it as such at the placement API layer. The 
only current consumers in the placement service are instances and 
migrations, both of which use a UUID identifier. I don't think it's too 
onerous to require future consumers to be identified with a UUID, and it 
would be nice to be able to rely on a structured, agreed format for 
unique identification of consumers across services.


As noted the project_id and user_id are not required to be UUIDs and I 
don't believe we should add any validation for those fields.


Best,
-jay

[1] For those curious, nova-scheduler calls 
scheduler.utils.claim_resources(...):


https://github.com/openstack/nova/blob/8469fa70dafa83cb068538679100bede7679edc3/nova/scheduler/filter_scheduler.py#L219

which itself calls reportclient.claim_resources(...) with the 
instance.user_id and instance.project_id values:


https://github.com/openstack/nova/blob/8469fa70dafa83cb068538679100bede7679edc3/nova/scheduler/utils.py#L500

The instance.project_id and instance.user_id values are populated from 
the WSGI environ here:


https://github.com/openstack/nova/blob/8469fa70dafa83cb068538679100bede7679edc3/nova/compute/api.py#L831-L832

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] placement update 18-27

2018-07-10 Thread Jay Pipes

On 07/09/2018 02:52 PM, Chris Dent wrote:

On Fri, 6 Jul 2018, Chris Dent wrote:


This is placement update 18-27, a weekly update of ongoing
development related to the [OpenStack](https://www.openstack.org/)
[placement
service](https://developer.openstack.org/api-ref/placement/). This
is a contract version.


Forgot to mention: There won't be an 18-28 this Friday, I'll be out
and about. If someone else would like to do one, that would be
great.


On it.

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] [all] TC Report 18-26

2018-07-06 Thread Jay Pipes

On 07/06/2018 12:58 PM, Zane Bitter wrote:

On 02/07/18 19:13, Jay Pipes wrote:

Nova's primary competition is:

* Stand-alone Ironic
* oVirt and stand-alone virsh callers
* Parts of VMWare vCenter [3]
* MaaS in some respects


Do you see KubeVirt or Kata or Virtlet or RancherVM ending up on this 
list at any point? Because a lot of people* do.

>

* https://news.ycombinator.com/item?id=17013779


Please don't lose credibility by saying "a lot of people" see things 
like RancherVM as competitors to Nova [1] by pointing to a HackerNews 
[2] thread where two people discuss why RancherVM exists and where one 
of those people is Darren Shepherd, a co-founder of Rancher, previously 
at Citrix and GoDaddy with a long-known distaste for all things OpenStack.


I don't think that thread is particularly unbiased or helpful.

I'll respond to the rest of your (excellent) points a little later...

Best,
-jay

[1] Nova isn't actually mentioned there. "OpenStack" is.

[2] I've often wondered who has time to actually respond to anything on 
HackerNews. Same for when Slashdot was a thing. In fact, now that I 
think about it, I spend entirely too much time worrying about all of 
this stuff... ;)


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] [all] TC Report 18-26

2018-07-03 Thread Jay Pipes
optional component, by there still not being a way to download secrets 
to a vm securely from the secret store, by the secret store also being 
completely optional, etc. An app developer can't rely on any of it. :/ Heat is 
hamstrung by the lack of blessing so many other OpenStack services are. You 
can't fix it until you fix that fundamental brokenness in OpenStack.


I guess I just fundamentally disagree that having a monolithic 
all-things-for-all-users application architecture and feature set is 
something that OpenStack should be.


There is a *reason* that Kubernetes jettisoned all the cloud provider 
code from its core. The reason is because setting up that base stuff is 
*hard* and that work isn't germane to what Kubernetes is (a container 
orchestration system, not a datacenter resource management system).



Heat is also hamstrung being an orchestrator of existing API's by there being 
holes in the API's.


I agree there are some holes in some of the APIs. Happy to work on 
plugging those holes as long as the holes are properly identified as 
belonging to the correct API and are not simply a feature request what 
would expand the scope of lower-level plumbing services like Nova.



Think of OpenStack like a game console. The moment you make a component 
optional and make it takes extra effort to obtain, few software developers 
target it and rarely does anyone one buy the addons it because there isn't 
software for it. Right now, just about everything in OpenStack is an addon. 
Thats a problem.


I don't have any game consoles nor do I develop software for them, so I 
don't really see the correlation here. That said, I'm 100% against a 
monolithic application approach, as I've mentioned before.


Best,
-jay


Thanks,
Kevin



From: Jay Pipes [jaypi...@gmail.com]
Sent: Monday, July 02, 2018 4:13 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [tc] [all] TC Report 18-26

On 06/27/2018 07:23 PM, Zane Bitter wrote:

On 27/06/18 07:55, Jay Pipes wrote:

Above, I was saying that the scope of the *OpenStack* community is
already too broad (IMHO). An example of projects that have made the
*OpenStack* community too broad are purpose-built telco applications
like Tacker [1] and Service Function Chaining. [2]

I've also argued in the past that all distro- or vendor-specific
deployment tools (Fuel, Triple-O, etc [3]) should live outside of
OpenStack because these projects are more products and the relentless
drive of vendor product management (rightfully) pushes the scope of
these applications to gobble up more and more feature space that may
or may not have anything to do with the core OpenStack mission (and
have more to do with those companies' product roadmap).


I'm still sad that we've never managed to come up with a single way to
install OpenStack. The amount of duplicated effort expended on that
problem is mind-boggling. At least we tried though. Excluding those
projects from the community would have just meant giving up from the
beginning.


You have to have motivation from vendors in order to achieve said single
way of installing OpenStack. I gave up a long time ago on distros and
vendors to get behind such an effort.

Where vendors see $$$, they will attempt to carve out value
differentiation. And value differentiation leads to, well, differences,
naturally.

And, despite what some might misguidedly think, Kubernetes has no single
installation method. Their *official* setup/install page is here:

https://kubernetes.io/docs/setup/pick-right-solution/

It lists no fewer than *37* (!) different ways of installing Kubernetes,
and I'm not even including anything listed in the "Custom Solutions"
section.


I think Thierry's new map, that collects installer services in a
separate bucket (that may eventually come with a separate git namespace)
is a helpful way of communicating to users what's happening without
forcing those projects outside of the community.


Sure, I agree the separate bucket is useful, particularly when paired
with information that allows operators to know how stable and/or
bleeding edge the code is expected to be -- you know, those "tags" that
the TC spent time curating.


So to answer your question:

 zaneb: yeah... nobody I know who argues for a small stable
core (in Nova) has ever said there should be fewer higher layer
services.
 zaneb: I'm not entirely sure where you got that idea from.


Note the emphasis on *Nova* above?

Also note that when I've said that *OpenStack* should have a smaller
mission and scope, that doesn't mean that higher-level services aren't
necessary or wanted.


Thank you for saying this, and could I please ask you to repeat this
disclaimer whenever you talk about a smaller scope for OpenStack.


Yes. I shall shout it from the highest mountains. [1]


Because for those of us working on higher-level services it feels like
there has been a non-stop chorus (both inside and outs

Re: [openstack-dev] [barbican][cinder][glance][nova] Goodbye from JHUAPL

2018-07-03 Thread Jay Pipes
Thanks so much for your contributions to our ecosystem, Brianna! I'm sad 
to see you go! :(


Best,
-jay

On 07/03/2018 03:13 PM, Poulos, Brianna L. wrote:

All,

After over five years of contributing security features to OpenStack, 
the JHUAPL team is wrapping up our involvement with OpenStack.


To all who have reviewed/improved/accepted our contributions, thank 
you.  It has been a pleasure to be a part of the community.


Regards,

The JHUAPL OpenStack Team



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] [all] TC Report 18-26

2018-07-03 Thread Jay Pipes

On 07/02/2018 03:31 PM, Zane Bitter wrote:

On 28/06/18 15:09, Fox, Kevin M wrote:
  * made the barrier to testing/development as low as 'curl 
http://..minikube; minikube start' (this spurs adoption and 
contribution)


That's not so different from devstack though.

  * not having large silo's in deployment projects allowed better 
communication on common tooling.
  * Operator focused architecture, not project based architecture. 
This simplifies the deployment situation greatly.
  * try whenever possible to focus on just the commons and push vendor 
specific needs to plugins so vendors can deal with vendor issues 
directly and not corrupt the core.


I agree with all of those, but to be fair to OpenStack, you're leaving 
out arguably the most important one:


     * Installation instructions start with "assume a working datacenter"

They have that luxury; we do not. (To be clear, they are 100% right to 
take full advantage of that luxury. Although if there are still folks 
who go around saying that it's a trivial problem and OpenStackers must 
all be idiots for making it look so difficult, they should really stop 
embarrassing themselves.)


This.

There is nothing trivial about the creation of a working datacenter -- 
never mind a *well-running* datacenter. Comparing Kubernetes to 
OpenStack -- particular OpenStack's lower levels -- is missing this 
fundamental point and ends up comparing apples to oranges.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [sqlalchemy][db][oslo.db][mistral] Is there a recommended MySQL driver for OpenStack projects?

2018-07-03 Thread Jay Pipes

On 07/03/2018 08:47 AM, Doug Hellmann wrote:

If you have a scaling issue that may be solved by eventlet, that's
one thing, but please don't adopt eventlet just because a lot of
other projects have.  We've tried several times to minimize our
reliance on eventlet because new releases tend to introduce bugs.

Have you tried the 'threading' executor?


+1

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] [all] TC Report 18-26

2018-07-02 Thread Jay Pipes

On 06/27/2018 07:23 PM, Zane Bitter wrote:

On 27/06/18 07:55, Jay Pipes wrote:
Above, I was saying that the scope of the *OpenStack* community is 
already too broad (IMHO). An example of projects that have made the 
*OpenStack* community too broad are purpose-built telco applications 
like Tacker [1] and Service Function Chaining. [2]


I've also argued in the past that all distro- or vendor-specific 
deployment tools (Fuel, Triple-O, etc [3]) should live outside of 
OpenStack because these projects are more products and the relentless 
drive of vendor product management (rightfully) pushes the scope of 
these applications to gobble up more and more feature space that may 
or may not have anything to do with the core OpenStack mission (and 
have more to do with those companies' product roadmap).


I'm still sad that we've never managed to come up with a single way to 
install OpenStack. The amount of duplicated effort expended on that 
problem is mind-boggling. At least we tried though. Excluding those 
projects from the community would have just meant giving up from the 
beginning.


You have to have motivation from vendors in order to achieve said single 
way of installing OpenStack. I gave up a long time ago on distros and 
vendors to get behind such an effort.


Where vendors see $$$, they will attempt to carve out value 
differentiation. And value differentiation leads to, well, differences, 
naturally.


And, despite what some might misguidedly think, Kubernetes has no single 
installation method. Their *official* setup/install page is here:


https://kubernetes.io/docs/setup/pick-right-solution/

It lists no fewer than *37* (!) different ways of installing Kubernetes, 
and I'm not even including anything listed in the "Custom Solutions" 
section.


I think Thierry's new map, that collects installer services in a 
separate bucket (that may eventually come with a separate git namespace) 
is a helpful way of communicating to users what's happening without 
forcing those projects outside of the community.


Sure, I agree the separate bucket is useful, particularly when paired 
with information that allows operators to know how stable and/or 
bleeding edge the code is expected to be -- you know, those "tags" that 
the TC spent time curating.



So to answer your question:

 zaneb: yeah... nobody I know who argues for a small stable 
core (in Nova) has ever said there should be fewer higher layer 
services.

 zaneb: I'm not entirely sure where you got that idea from.


Note the emphasis on *Nova* above?

Also note that when I've said that *OpenStack* should have a smaller 
mission and scope, that doesn't mean that higher-level services aren't 
necessary or wanted.


Thank you for saying this, and could I please ask you to repeat this 
disclaimer whenever you talk about a smaller scope for OpenStack.


Yes. I shall shout it from the highest mountains. [1]

Because for those of us working on higher-level services it feels like 
there has been a non-stop chorus (both inside and outside the project) 
of people wanting to redefine OpenStack as something that doesn't 
include us.


I've said in the past (on Twitter, can't find the link right now, but 
it's out there somewhere) something to the effect of "at some point, 
someone just needs to come out and say that OpenStack is, at its core, 
Nova, Neutron, Keystone, Glance and Cinder".


Perhaps this is what you were recollecting. I would use a different 
phrase nowadays to describe what I was thinking with the above.


I would say instead "Nova, Neutron, Cinder, Keystone and Glance [2] are 
a definitive lower level of an OpenStack deployment. They represent a 
set of required integrated services that supply the most basic 
infrastructure for datacenter resource management when deploying OpenStack."


Note the difference in wording. Instead of saying "OpenStack is X", I'm 
saying "These particular services represent a specific layer of an 
OpenStack deployment".


Nowadays, I would further add something to the effect of "Depending on 
the particular use cases and workloads the OpenStack deployer wishes to 
promote, an additional layer of services provides workload orchestration 
and workflow management capabilities. This layer of services include 
Heat, Mistral, Tacker, Service Function Chaining, Murano, etc".


Does that provide you with some closure on this feeling of "non-stop 
chorus" of exclusion that you mentioned above?


The reason I haven't dropped this discussion is because I really want to 
know if _all_ of those people were actually talking about something else 
(e.g. a smaller scope for Nova), or if it's just you. Because you and I 
are in complete agreement that Nova has grown a lot of obscure 
capabilities that make it fiendishly difficult to maintain, and that in 
many cases might never have been requested if we'd had higher-level 
tools that could meet the same us

  1   2   3   4   5   6   7   8   9   10   >