Awight has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/346664 )

Change subject: Document a bit
......................................................................

Document a bit

Change-Id: I879936947ff4af826b173059e742637f9aeefaed
---
M README.md
A job.example.yaml
M process-control.example.yaml
3 files changed, 90 insertions(+), 42 deletions(-)


  git pull 
ssh://gerrit.wikimedia.org:29418/wikimedia/fundraising/process-control 
refs/changes/64/346664/1

diff --git a/README.md b/README.md
index 60b662e..e59c26c 100644
--- a/README.md
+++ b/README.md
@@ -8,59 +8,47 @@
 Configuration
 =======
 
-Global configuration must be created before you can run jobs (FIXME: works out
-of the box).  Copy the file 
/usr/share/doc/process-control/process-control.example.yaml
-to /etc/fundraising/process-control.yaml
+Global configuration must be created before you can run jobs (FIXME: Make this
+work out-of-the-box).  Copy the file
+/usr/share/doc/process-control/process-control.example.yaml
+to /etc/fundraising/process-control.yaml and customize it for your machine.
+
+You'll need to pick a service user, and make /var/log/process-control writable
+by that user.
 
 Job descriptions
 =======
 
-A job description file has the following format,
-
-```yaml
-name: Take This Job and Shove It
-
-# The commandline that will be run.  This is executed from Python and not from
-# a shell, so globbing and other trickery will not work.  Please give a full
-# path to the executable.
-#
-# Alternatively, a job can be configured as a list of several commands.  These
-# are executed in sequence, and execution stops at the first failure.
-#
-#command:
-#    # Run sub-jobs, each with their own lock and logfiles.
-#    - /usr/bin/run-job prepare_meal
-#    - /usr/bin/run-job mangia
-#    - /usr/bin/run-job clean_up_from_meal
-#
-command: /usr/local/bin/timecard --start 9:00 --end 5:30
-
-# Optional schedule, in Vixie cron format:
-# minute hour day-of-month month day-of-week
-schedule: "*/5 * * * *"
-
-# Optional flag to prevent scheduled job execution.  The job
-# can still be run as a single-shot.
-disabled: true
-
-# Optional timeout in minutes, after which your job will be
-# aborted.  Defaults to no timeout.
-timeout: 30
-
-# Optional environment variables.
-environment:
-       PYTHONPATH: /usr/share/invisible/pie
-```
+Each job is described in a YAML file under the /var/lib/process-control
+directory (by default).  See `job.example.yaml` for the available keys and
+their meaninings.
 
 Running
 =======
+
 Jobs can be run by name,
     run-job job-a-thon
 which will look for a job configuration in 
`/var/lib/process-control/job-a-thon.yaml`.
 
-Some actions are shoehorned in, and can be accessed like:
+Other actions on jobs can be accessed like:
     run-job --list-jobs
-       run-job --kill-job job-a-thon
+
+Scheduled Jobs
+======
+
+Any job that includes a `schedule` key and does not have `disabled: true` can
+be automatically scheduled.  The schedule value is given as a five-term Vixie
+crontab (man 5 crontab), but aliases like `@daily` are not allowed.
+
+A script `cron-generate` will read all scheduled jobs and write entries to
+/etc/cron.d/process-control, or the configured `output_crontab`.  For example,
+a job `yak` with the schedule `30 12 * * *` will be written 
+
+Cron-generate takes no arguments, its configuration is read from /etc.
+
+    cron-generate
+
+All cron jobs 
 
 Failure detection
 ======
@@ -74,11 +62,23 @@
 * Non-zero subprocess exit code.
 * Timeout.
 
+Security
+======
+
+This tool was written for a typical environment where software developers and
+operations engineers have different permissions.  The design is supposed to
+make it reasonably safe for a group of developers to make auditable changes to
+job configuration without help from operations engineers, and it should not be
+possible for users to escalate privileges to anything but running processes as
+the service user.
+
+It should also not be possible to run arbitrary job descriptions from a user's
+home directory.  We recommend deploying the `job_directory` in a way that all
+changes can be audited.
 
 TODO
 ====
 
-* Syslog actions, at least when tweezing new crontabs.
 * Log invocations.
 * Prevent future job runs when unrecoverable failure conditions are detected.
 * Fine-tuning of failure detection.
diff --git a/job.example.yaml b/job.example.yaml
new file mode 100644
index 0000000..8115936
--- /dev/null
+++ b/job.example.yaml
@@ -0,0 +1,42 @@
+# Copy this job to your configured `job_directory` and give it a name, like
+# `purge_binge.yaml`.
+
+# Verbose job name.  The short, machine name is taken from the base file name.
+name: Take This Job and Shove It
+
+# The commandline that will be run.  This is executed from Python and not from
+# a shell, so globbing, redirecting, and other trickery will not work.  Please
+# give the full path to executables as in a crontab.
+#
+# Alternatively, a job can be configured as a list of several commands.  These
+# are executed in sequence, and execution stops at the first failure.
+#
+#command:
+#    # Run a command directly.
+#    - /usr/bin/puppet apply
+#
+#    # Run sub-jobs, each with their own lockfiles, logfiles, and timeout.
+#    # Remember to set the parent job's timeout to something long enough to 
cover
+#    # all sub-jobs, or to zero for unlimited.
+#    - /usr/bin/run-job prepare_meal
+#    - /usr/bin/run-job mangia
+#    - /usr/bin/run-job clean_up_from_meal
+#
+command: /usr/local/bin/timecard --start 9:00 --end 5:30
+
+# Optional schedule, in Vixie cron format:
+# minute hour day-of-month month day-of-week
+schedule: "*/5 * * * *"
+
+# Optional flag to prevent scheduled job execution.  The job
+# can still be run as a single-shot.
+#disabled: true
+
+# Optional timeout in minutes, after which your job will be
+# aborted.  Defaults to no timeout, or whatever is configured in
+# /etc/process-control.yaml
+timeout: 30
+
+# Optional environment variables.
+environment:
+       PYTHONPATH: /usr/share/invisible/pie
diff --git a/process-control.example.yaml b/process-control.example.yaml
index 5623336..746af0a 100644
--- a/process-control.example.yaml
+++ b/process-control.example.yaml
@@ -20,11 +20,17 @@
         from_address: "Fail Mail <fr-t...@wikimedia.org>"
         to_address: "fr-t...@wikimedia.org"
 
+    # Make the default timeout ten minutes.  If this line is removed, jobs will
+    # have unlimited time to run.
     timeout: 10
 
 job_directory: /var/lib/process-control
 
 # Python logging config, 
https://docs.python.org/2/library/logging.config.html#logging-config-dictschema
+# These are reasonable defaults that will send process-control output to
+# syslog.  Note that stderr and stdout from the job's command will go to a
+# logfile in /var/log/process-control and not through the logging settings
+# here.
 logging:
     version: 1
     formatters:

-- 
To view, visit https://gerrit.wikimedia.org/r/346664
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I879936947ff4af826b173059e742637f9aeefaed
Gerrit-PatchSet: 1
Gerrit-Project: wikimedia/fundraising/process-control
Gerrit-Branch: master
Gerrit-Owner: Awight <awi...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to