On Friday, January 10, 2014 3:57:03 PM UTC-5, jcbollinger wrote:
>
>
>
> On Thursday, January 9, 2014 6:23:23 PM UTC-6, Patrick Hemmer wrote:
>>
>> There's been an idea floating in my mind for quite a while now about 
>> using a job queue for compiling puppet catalogs. I just mentioned the idea 
>> on IRC and a few people really liked the idea, so I thought I'd bring it up 
>> here and get other thoughts on it.
>>
>> The idea is that instead of puppet masters compiling catalogs on demand, 
>> they would operate out of a job queue:
>>
>>    - When a node's cert is signed, the compilation queue gets a job for 
>>    that node.
>>    - When a compile job finishes, that node gets added to the queue at 
>>    the very end. This results in the puppet master constantly compiling 
>>    catalogs.
>>    - When a catalog changes, the puppet master notifies the node that it 
>>    needs to fetch the updated catalog and do a puppet run.
>>    - When an exported resource changes, any nodes which collect that 
>>    resource should get their compilation jobs moved to the front of the 
>> queue. 
>>    (This should be optional, as the behavior might not be desired)
>>    - Nodes would still run puppet out of a cron job, but they would use 
>>    the cached catalog, not request one from the master.
>>    - If any of the facts used in the catalog change, the node would 
>>    notify the master and the compilation job would move to the front of the 
>>    queue.
>>
>> In the world of cloud computing, this becomes extremely beneficial, 
>> though it still is advantageous for traditional datacenter environments:
>>
>>    - Puppet runs are computationally expensive for the client. With this 
>>    design, the client could have its cron job set to a very infrequent value 
>>    (once per hour or such). This way small instances without much available 
>>    resources wont waste them on puppet runs that don't change anything.
>>    - The masters could be very beefy instances with a lot of CPU. By 
>>    constantly generating catalogs, you ensure that the system's resources 
>>    aren't sitting idle and being wasted.
>>    - By using a queuing mechanism, you can determine how loaded the 
>>    masters are. If the oldest job in the queue is X minutes old, you can 
>> spin 
>>    up another master.
>>    - With the current method of generating catalogs on demand, if a lot 
>>    of nodes request a catalog at the same time, it can cause compilations to 
>>    go very slow. If they go too slow, the client will get a timeout. The 
>>    client will then go to the master again to request another catalog when 
>> the 
>>    first generation completed fine, it just took a while. With queuing, the 
>>    master can compile exactly X amount of catalogs at the same time. It can 
>>    even be configured to only start a new compilation job if the system load 
>>    is less than X.
>>    - Since puppet has to serve up both files and catalogs, if all the 
>>    available processes are used compiling catalogs, requests for files end 
>> up 
>>    hanging. By moving catalog compilations to a background worker, file 
>>    requests will be faster.
>>    (you can implement a workaround for this: I have 2 puppet master 
>>    pools, catalog requests get routed to one pool, everything else to the 
>>    other pool, but this isn't a simple or standard design)
>>
>> Now most of this could be done on top of puppet. You could create a 
>> system which reads from a queue, performs a `puppet master compile`, 
>> notifies the client if it changes, etc. But there are a few sticky points 
>> which it wouldn't be able to do (just features, none would prevent the 
>> system from working):
>>
>>    - There is no way of determining which facts were used in a catalog 
>>    compilation. A client could issue a catalog compilation request when any 
>>    fact changes, but facts like `uptime_seconds` always change, so it would 
>>    always result in a catalog compilation. The proper way to handle this 
>> would 
>>    be to monitor when a fact variable is read. Then add a list of "facts 
>> used" 
>>    and their value to the resulting catalog. Then the puppet agent can use 
>>    that to see if the facts used have changed.
>>    This may not be a significant issue though. If the `puppet agent` 
>>    cron job is set to something very infrequent, it won't be requesting 
>>    catalog compilations very often.
>>    - The catalog doesn't indicate whether a resource was collected from 
>>    another node. So when an exported resource changes, you wouldn't be able 
>> to 
>>    find nodes which collect those exported resources and recompile those 
>>    catalogs. Now this isn't a big deal since the current method of using 
>> cron 
>>    jobs can't even do this.
>>    The catalog does indicate that a resource is exported, so you could 
>>    look for nodes with resources of the same type & title as the exported 
>>    resource. But it's possible for a resource to have the same type & title 
>> of 
>>    an exported resource as long as the exported resource isn't collected 
>>    during the same catalog compilation, so this method might end up 
>>    recompiling catalogs which don't use the exported resource.
>>
>> Puppet could probably be monkey patched to address these and add the data 
>> to the catalog. The script system could then extract the info, while the 
>> puppet agent would just ignore the data.
>>
>>
>> Thoughts? I might be up for coding this, but it's would be low on the 
>> priority list and I wouldn't get to it for a long time.
>>
>>
>
> The key idea here seems to be to improve the master's average response 
> time for catalog requests by pre-compiling and caching catalogs, then 
> serving those cached catalogs on demand.  There is a secondary idea of the 
> master automatically keeping its cache fresh, and a tertiary idea of the 
> master hinting to agents when it thinks their most recently-retrieved 
> catalog may be out of date.  There is a separate, but coupled idea that the 
> agent might keep track of when significant facts change, and thereby be 
> better informed from its side whether its catalog is out of date.
>
> The key idea is also about not wasting resources. The clients waste a lot 
of resources doing puppet runs that change nothing. The puppet master is 
generally a box meant to compile catalogs. It's going to have the 
resources, and so it should spend it's time doing just that so the clients 
don't waste their time.
You can have a hundred clients frequently doing puppet runs because they 
can't know if the catalog has been updated (which also uses the masters's 
resources), or you can have a few masters chewing through the catalogs and 
notify the clients when they need to run. Which do you think is more 
efficient? 
 

> Fundamentally, this is really about catalog caching.  The specific 
> suggestion to implement the master's part of it via a continuously cycling 
> queue of compilation jobs is mostly an implementation detail.  Although 
> it's hard to argue with the high-level objectives, I'm not enthusiastic 
> about the proposed particulars.
>
> As far as I can determine, the main point of a *continuously-cycling*queue of 
> compile jobs is to enable the master to detect changes in the 
> compiled results.  That's awfully brute-force.  Supposing that manifests, 
> data, and relevant node facts change infrequently, compilation results also 
> will rarely change, and therefore continuously recompiling catalogs would 
> mostly be a waste of resources.  The master could approach the problem a 
> lot more intelligently by monitoring the local artifacts that contribute to 
> catalog compilation -- ideally on a per-catalog basis -- to inform it when 
> to invalidate cached catalogs.  Exported resource collections present a bit 
> of a challenge in that regard, but one that should be surmountable.
>
There is no way around it. You can't just rely upon facts or exported 
resources to change before re-compiling a catalog. Lets say you have a 
function that generates different results based on some external variable. 
o fact or exported resource has changed, but the catalog will be different. 
While this may not be common, I think it would be a bad idea to build an 
implementation that can't handle it.
However it would be trivial to change it so that instead of constantly 
compiling catalogs, it re-compiles any catalog that is more than X minutes 
old. This would be equivalent to the puppet agent cron job on a client.
 

>
> Furthermore, if we accept that master-side catalog caching will be 
> successful in the first place, then it is distinctly unclear that compiling 
> catalogs prospectively would provide a net advantage over compiling them on 
> demand.  Prospective compilation's only real advantage over on-demand 
> compilation is better load leveling in the face of clumped catalog requests 
> (but only where multiple requests in the clump cannot be served from 
> cache).  On the other hand, prospective compilation will at least 
> occasionally yield slower catalog service as a catalog request has to wait 
> on other catalog compilations.  
>
I completely lost you. Why would it be slower? Pre-compiling catalogs means 
the client doesn't have to wait at all. Instead of having to wait 30 
seconds for a catalog to be compiled, it'll take less than 1 second, 
whatever the response time is to fetch it from the cache and transfer it 
over the network.
 

> Also, prospective compilation will sometimes devote resources to compiling 
> catalogs that ultimately go unused.
>
That's the goal. If resources are unused, then they're being wasted.
 

>  
>

> John
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/8ace8997-f8fb-4e30-93b6-38e9274c7966%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to