On Wed, Dec 30, 2009 at 8:40 PM, Tim Serong <tser...@novell.com> wrote:
> On 12/30/2009 at 05:30 AM, Lars Ellenberg <lars.ellenb...@linbit.com> wrote:
>> On Mon, Dec 28, 2009 at 08:00:24PM -0700, Tim Serong wrote:
>> > IMO a solution that doesn't rely on shared storage is preferable.
>>
>> how about:
>>  lsof | sed > outfile && csync2 -xv outfile ?

I also think csync2 can do the job, I want to say here is about the
'lsof -nP -i4...@$ocf_reskey_ip -F nT'. One phenomenon in my
testing is lsof can't grab all the established TCP connections but
netstat can do. In other words, lsof's result is not the same as
'netstat -tn'. I don't know why and haven't investigate much for now.

>>
>> that is, generate a local status file,
>> then "rsync" it to the rest of the cluster nodes?
>>
>> maybe add a "invoke_sync_script" parameter,
>> which, if present, will be invoked with a single argument,
>> the status file, after it has been updated.
>> that script can then do csync2, rsync, scp, whatever.
>> should of course have appropriate timeouts, possibly
>> background itself, ...
>
> That's not bad...  Means the status file(s) can live somewhere "normal",
> like under /var, and removes any dependency on shared storage.  Also
> not reliant on any particular messaging layer (although does require
> setup & configuration of csync2 [or whatever], which may or may not
> otherwise be necessary, depending on the deployment).

Setting up and configuring csync2 should be OK since we also plan to
use it as a standard way to sync some other configuration files.

>
> What's the worst case load a regular sync of this nature could result
> in?  (I'm thinking monitoring every few seconds, multiple IPs on
> multiple nodes, resultant multiple syncs...)

This may need some time to investigate as I haven't looked into more
about how the csync2 works and in the worst case how its performance
is.

>
>> We only need to do "best effort" here anyways.
>> btw, if mtime of status file is older than $something,
>> tickles should probably be skipped...
>
> Yep.
>
>> > > Another important thing I think we should address is if the tickle
>> > > feature should be added in IPaddr2 RA? When you deploy your HA
>> > > solution, maybe sometimes you should configure the application service
>> > > started after the IPaddr2 started, but sometimes you should configure
>> > > IPaddr2 as the first-started resource then started the application. If
>> > > it is the latter, if you tickle ACK when IPaddr2 started, but the real
>> > > service application is not started at that time, the user may see the
>> > > error like "Port is not reachable", this is not a good usability.
>> > >
>> > > So we may need to start the tickle when the application is ready.
>> > >
>> > > One simple implementation of this is to add the tickle feature in a
>> > > seperated RA and add it to the last in the service group when you
>> > > deploy it. Does this make sence? If yes, I'll implement it :)
>> >
>> > Yes, this is a good point.  It may be that we actually want to do
>> > something like this:
>> >
>> >   start:
>> >     1) add iptables rule to drop incoming packets to IP address
>> >     2) bring up IP address
>> >     3) bring up HA service (database, storage, web server, whatever)
>> >     4) remove iptables blocking rule
>> >     5) perform tickle ack
>> >
>> >   stop (reverse of above, but fewer steps necessary):
>> >     1) add iptables rule to drop incoming packets to IP address
>> >     2) stop HA service
>> >     3) bring down IP address
>> >
>> >
>> > In the "start" case, I can imagine the IPaddr2 RA doing steps 1 and 2,
>> > whatever existing RA(s) doing step 3, then a separate "tickle" RA doing
>> > steps 4 and 5.  Likewise in reverse for stop.  Without something like
>> > this, there's at least two windows of opportunity where clients are
>> > either refused, or see the connection close (between steps 2 & 3 during
>> > "start", and any time after step 2 in "stop" when doing a clean migrate
>> > from one node to another).
>>
>> So better integrate it into the portblock RA?
>> on "action=unblock start", send tickles.
>> on "action=unblock stop", save status one last time.
>> (so it will be available after a clean switchover,
>> in case connections have not been cleanly shutdown)
>
> That'd do it :)
>
>> on "probe" (monitor_0) do nothing!
>> or you'd truncate the status file ;)
>
> Hang on, what then becomes responsible for performing the monitor that
> periodically updates the status file?  (sorry, my brain seems to have
> decided to shut itself down for the evening).

That should mean you don't save the connections when "probe" which is
before you "start". But when you have started the resource and do the
"monitor", you should save the connections.

Thanks,
Jiaju
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to