В Wed, 27 May 2015 19:09:57 +1000
Adam Zegelin <a...@instaclustr.com> пишет:

> Hi list,
> 
> I’m running Cassandra (C*, a clustered database) as a systemd service. 
> Currently this is just a “Type=simple” service, as such, dependant units will 
> start as soon as the C* process starts rather than when C* is accepting 
> client connections.
> 
> I’d like to transition to something more complex so I can start to write 
> additional units that depend on C*.
> 
> I’ve successfully managed to set the service type to “notify” and modify C* 
> to call sd_notify() when is ready to accept client connections.
> Further experimentation reveals that this is not an ideal solution. C* can 
> take a long time (minutes to _hours_) to reach the point where it will accept 
> client connections/queries. The default startup timeout is 90s, which causes 
> the service to be marked failed if exceeded, hence C*, with its long startup 
> times, will often never get the chance to transition to “active”.
> 
> 
> Part of the issue for me is trying to define what “active” means. The man 
> pages, for “Type=forking" services, says: "The parent process is expected to 
> exit when start-up is complete and all communication channels are set up”. 
> I’m assuming for “notify” services, sd_notify() should be called when 
> "start-up is complete and all communication channels are set up”. Even if 
> this takes hours?
> 
> Cassandra exposes a number of inet ports of interest:
> - Client connection ports for running queries via Cassandra Query Language 
> (CQL)/Thrift (RPC) — this is what most clients use to query the database 
> (i.e., to run `SELECT * FROM …` style queries)
> - JMX (Java Management Extensions) for performing management operations — the 
> C* and 3rd-party management tools use this to call management functions and 
> to collect statistics/metrics about the JVM and C*.
> 
> The JMX socket is available a few seconds after the process is running.
> 
> The CQL/Thrift ports can take far longer to become available — sometimes 
> hours after the process starts. Cassandra only starts listening on these 
> ports once it has joined the cluster of nodes & has synchronised its state. 
> State synchronisation may require bootstrapping & copying large amounts of 
> data across the network and hence take a long time to complete.
> 
> Currently my dependent C* client units simply spin-wait, attempting to 
> establish a connection to C*. This seems like duplicated effort and makes 
> these services more complex than they need to be.
> 
> My original thought was to just disable the startup timeout on the C*, but 
> that means the unit will stay “activating” for a long time. Also means that 
> JMX clients, which can establish connections almost immediately, would have 
> their startup deferred unnecessarily.
> 

I suppose this could be generalized to service announcing different
tokens to systemd and other services being dependent on these tokens.
This may be employed by single binary offering multiple client
interfaces where each interface can be independently up or down.

Hmm ... this sounds suspiciously like what D-Bus does. Did you consider
using D-Bus in your application? 

But for now there is no way to express such dependency in systemd;
D-Bus being exception, you can make services dependent on D-Bus end
points.

> Ideally I’d like to be able to write units that can depend on individual 
> ports being available from a process — i.e, when the CQL port is available, 
> start the client unit(s) and when JMX is available, start a monitoring 
> service. Is this possible with systemd?
> 
> Alternatively, I was thinking that I could write some kind of simple 
> process/script that attempts a connection, and exits with failure if the 
> connection cannot be established, or success if it can. I’d then write a unit 
> file, e.g. `cassandra-cql-port.service`:
>       [Unit]
>       # not really sure what combo of 
> Wants/Requires/Requisite/BindsTo/PartOf/Before/After is needed
>       Requisite=cassandra.service
> 
>       [Service]
>       Type=oneshot
>       RemainAfterExit=true
>       ExecStart=/opt/bin/watch-port 9042
>       Restart=on-failure
>       RestartSec=1min
>       StartLimitInterval=0
> 
> My client units could then want/require this unit. Is this a valid approach?
> 

Yes, it is. Make a service that will wait for specific port being
available and order all clients after it. 

> Or am I walking down the wrong path to use systemd to manage this?
> 

Well, systemd is more focused on "one unit - one service" design, where
there is well defined master process that represents "the service". The
case of single unit offering (or starting) multiple independent
services does not really fit well here.

I wonder - can your master service trigger startup of clients when it is
ready? Note that it can be done in completely generic way - it can
simply run something like cassandra.target and you can plug in any
client into this target.
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to