Re: [openstack-dev] RFC - Icehouse logging harmonization

Sean Dague Wed, 23 Oct 2013 13:00:21 -0700

On 10/23/2013 03:35 PM, Robert Collins wrote:

On 24 October 2013 08:28, John Griffith <[email protected]> wrote:

So I touched on this a bit in my earlier post but want to reiterate here and
maybe clarify a bit.  I agree that cleaning up and standardizing the logs is
a good thing, and particularly removing unhandled exception messages would
be good.  What concerns me however is the approach being taken here of
saying things like "Error level messages are banned from Tempest runs".


The case I mentioned earlier of the negative test is a perfect example.
There's no way for Cinder (or any other service) to know the difference
between the end user specifying/requesting a non-existent volume and a valid
volume being requested that for some reason can't be found.  I'm not quite
sure how you place a definitive rule like "no error messages in logs" unless
you make your tests such that you never run negative tests?


Let me check that I understand: you want to check that when a user
asks for a volume that doesn't exist, they don't get it, *and* that
the reason they didn't get it was due to Cinder detecting it's
missing, not due to e.g. cinder throwing an error and returning 500 ?

If so, that seems pretty straight forward; a) check the error that is
reported (it should be a 404 and contain an explanation which we can
check) and b) check the logs to see that nothing was logged (because a
server fault would be logged).

There are other cases in cinder as well that I'm concerned about.  One
example is iscsi target creation, there are a number of scenarios where this
can fail under certain conditions.  In most of these cases we now have retry
mechanisms or alternate implementations to complete the task.  The fact is
however that a call somewhere in the system failed, this should be something
in my opinion that stands out in the logs.  Maybe this particular case would
be well suited to being a warning other than an error, and that's fine.  My
point however though is that I think some thought needs to go into this
before making blanketing rules and especially gating criteria that says "no
error messages in logs".

Absolutely agreed. That's why I wanted to kick off this discussion andam thinking about how we get to agreement by Icehouse (giving this lotsof time to bake and getting different perspectives in here).

On the short term of failing jobs in tempest because they've got errorsin the logs, we've got a whole white list mechanism right now for"acceptable errors". Over time I'd love to shrink that to 0. But that'sgoing to be a collaboration between the QA team and the specific coreprojects to make sure that's the right call in each case. Who knows,maybe there are generally agreed to ERROR conditions that we trigger,but we'll figure that out overtime.

I think the iscsi example is a good case for WARNING, which is the samelevel we use when we fail to schedule a resource (compute / volume).Especially because we try to recover now. If we fail to recover, ERRORis probably called for. But if we actually failed to alocate a volume,we'd end up failing the tests anyways, which means the ERROR in the logwouldn't be a problem in and of itself.

I agree thought and care is needed. As a deployer my concern is that
the only time ERROR is logged in the logs is when something is wrong
with the infrastructure (rather than a user asking for something
stupid). I think my concern and yours can both be handled at the same
time.

Right, and I think this is the perspective that I'm coming from. Ourlogs (at INFO and up) are UX to our cloud admins.

We should be pretty sure that we know something is a problem if we tagit as an ERROR, or CRITICAL. Because that's likely to be something thatnegatively impacts someones day.

If we aren't completely sure your cloud is on fire, but we're prettysure something is odd, WARNING is appropriate.

If it's no good, but we have no way to test if it's a problem, it's justINFO. I really think the "not found" case falls more into standard INFO.

Again, more concrete instances like the iscsi case, are probably themost helpful. I think in the abstract this problem is too hard to solve,but with examples, we can probably come to some concensus.


        -Sean

--
Sean Dague
http://dague.net

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] RFC - Icehouse logging harmonization

Reply via email to