I realise the chances of any help on this are slim with the current advice on 
moving to cfengine3, but I don't have the resources to do that, and I've just 
hit a bug in cfengine2 which is a royal pain, and I was hoping someone can help 
me fix it if possible.  Actually it's two things, one definite bug, and one 
side-effect of the way cfengine works that prevents me hacking my way around 
the first bug.

Consider a largish organisation using methods to do certain things (I use them 
for setting values in /etc/sysctl.conf, for example).  You have several admins, 
editing different parts of the policy, probably for different sets of machines. 
 These should not interfere with each other, but I've found a case where they 
do.  Here's a toy example that shows the problem - consider a tiny method which 
basically does nothing, other than an alert to say it's run:

control:
  MethodName       = ( TestMethod )
  MethodParameters = ( data )
  actionsequence = ( editfiles )

classes:
         dummy = ( any )

editfiles:
  linux::
    { /tmp/foo
      AppendIfNoSuchLine "${data}"
    }

alerts:
  dummy::
    "Executing method with ${data}"
    ReturnVariables(void)
    ReturnClasses(void)

Now, call this from a cfengine policy:

control:
  any::
    actionsequence = ( methods )

methods:
  linux::
    TestMethod("wibble")
      action=cf.TestMethod returnv=null returnc=null

All is well, this works.

Problem 1)

Now another admin comes along and adds another stanza for some other system.   
It might be in an import in a faraway part of the policy, but the effect would 
be that this has been done:

control:
  any::
    actionsequence = ( methods )

methods:
  solaris::
    TestMethod("wibble")
      action=cf.TestMethod returnv=null returnc=null
  linux::
    TestMethod("wibble")
      action=cf.TestMethod returnv=null returnc=null

When you run this policy on a linux host, the method is never called, because 
the solaris stanza causes the method-dispatch to lock with a 1 minute elapse 
timer *even though the method isn't actually being executed in that context*, 
and so the linux stanza never gets executed.  I've replicated this bug on 2.2.8 
and 2.2.10.

So what can I do about this?  I realise I could refactor the the cfengine 
policy so that any given set of method invocation arguments was only every 
called once, but that doesn't stop the problem resurfacing accidentally, 
especially if you want sysadmin teams to be reasonably independent in their 
work on their own machines.

One workaround I'm considering is hardcoding the ifelapsed and expireafter time 
values in calls to GetLock for the methods-dispatch database to zero in do.c.  
I don't *think* this is harmful, and in testing it does seem to work, but I'd 
appreciate it if someone more familiar with the code could comment.

Problem 2)

It still doesn't completely solve the problem, though, because the various 
calls to a method might also enable editing the same file (as indeed they do in 
the above case of updating /tmp/foo), so in my real world case my fix still 
doesn't work.

In fact, the problem's even worse than that - even if the parameters to the 
method are different, such that Problem 1 doesn't occur, any second execution 
of the method with different parameters will still fail, this time because of a 
lock on editfiles and /tmp/foo

I tried setting the editfiles stanza within the method to have IfElapsed 0, but 
that doesn't work.

All in all, I think I'm probably screwed here.  methods are the only thing in 
cfengine2 that really allow for parameterisation, but you can't actually 
execute a single method more than once in any single invocation of cfagent.  
Presumably this is all an artefact of the fact that methods don't inherit any 
locks from the parents, but actually really behave as completely independent 
cfagent processes, which was perhaps a slightly unfortunate implementation 
decision.

Am I missing something?

Does this all get better with cfengine3, or am I still going to be bitten by 
the same problem, especially problem 2?

Have I got my approach completely arse-about-face, and is there a better way of 
achieving what I'm trying to do?

Thanks,

Tim

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to