Hi,

On Mon, Apr 14, 2008 at 04:11:50PM -0400, Rob Morin wrote:
> Thanks again for replying quickly... i did not know that you can use 2 ip 
> addresses for heartbeat.... so with a config like that your saying i can 
> independently have either web or mail moved over... i see not bad..  :) 
> perhaps i can work on this later on....
>
> However for now my boss wants some sort of failover to be working very 
> soon,

Hmm. Perhaps your boss should be somehow, if possible at all,
made aware that high availability doesn't come as quick and
painless as he's obviously expecting. Once the cluster is being
setup, which should happen only after research and study, it
should also be tested in a very thorough way. Otherwise, you
might end up with a less available solution compared to a single
host.

> with the configs i showed  as is if Joe goes down stewie should take 
> over, correct?

Right.

> Also if i understand you i should be using both serial and eth0 as my 
> heartbeat?

Or eth0 and eth1.

> Thats the thing with the docs, if you can find what your looking for , its 
> handy, but the problem is finding it first...  :)
>
> I have a maintenance window tonight at 11pm est and i would like to try a 
> failover test, do you think from my configs it should be ok? i know DRBD is 
> working fine, i have no problems there......

Can't really say. You should test all your resources which are of
type LSB: see http://www.linux-ha.org/LSBResourceAgent.
Stop/start/status/double start/double stop should all work as
expected. Otherwise, sooner or later it's going to make a mess of
your cluster.

> also another question the server names joe and stewie should i be using 
> them fully qualified in the conf file IE: joe.domain.com rather than just 
> joe?

You should use whatever 'uname -n' prints.

> and that FQDN should be point to the public IP not the private 
> right....?

No. Your nodes have their ip addresses which should be known only
to the admins. The public IP is a virtual IP not tied to a
particular host. It could be something like web.domain.com.

Thanks,

Dejan

> sorry for being a pain....
>
> Thanks so much for all your help!
>
>
> Rob Morin
> Dido Internet Inc.
> Montreal,Canada
> http://www.dido.ca
> 514-990-4444
>
>
>
> Dejan Muhamedagic wrote:
>> Hi,
>>
>> On Mon, Apr 14, 2008 at 01:21:56PM -0400, Rob Morin wrote:
>>   
>>> Thanks for the reply Dejan....
>>>
>>> Our company has a policy to use Debian apt-get packages only, to my 
>>> knowledge what i am running is the latest available for AMD64 in apt-get 
>>> i know its not the latest.
>>>
>>> |The point of having a service in a cluster is to make
>>> |it more available, right? So, if your service is unstable, it
>>> |should first be fixed.
>>>
>>> Yes i agree with the above, but is it not better to have pop, imap, mysql 
>>> and mail deleviery working while apache is down? clients accessing some 
>>> services will be happier than having everything down...??
>>>     
>>
>> Agreed. However, in the setup you have, it is impossible. If you
>> have a single IP address as a dependency for all other resources,
>> then any resource failure will result in a failover of all
>> resources. The v1/haresources resource manager is not that
>> versatile and it is simply not possible to express what you want,
>> i.e. one dependency for all (the IP address) but that some of the
>> other resources are independent of each other. For that you'd
>> have to use v2/crm.
>>
>>   
>>> Implementing heart & drbd ended up being more complicated on a production 
>>> system then i expected... i had to integrate into a live systems my test 
>>> in a dev system... its not as easy as 1 2 3 i found the HA website a bit 
>>> confusing as there seems to be a mix in documentation between 
>>> versions.....     
>>
>> Yes, it is generally agreed that the documentation is not
>> very good. So far nobody managed to fix it (and many tried).
>> Personally, I find it not too bad, just not well organized. Once
>> you get a grip on concepts, terminology, and clusters in general,
>> it boils down to searching the site (or wiki.linux-ha.org).
>> Unfortunately, that's not an option if one's in a hurry and there
>> are quite a few.
>>
>>   
>>> so this is why i am asking questions here... I wanted the heartbeat 
>>> implementation as simple as possible.... we use one IP for web, pop, 
>>> mysql and imap...
>>>     
>>
>> You would be better off with more than one IP address. In that
>> case, you could also have an active-active configuration. And,
>> more important, less artificial dependencies. I would go with at
>> least two: one for the mail and one for the web services.
>> Otherwise, a failure of one resource could bring down healthy
>> services. For example, according to your haresources, a failed
>> mail filesystem would stop not only mail services but also web
>> services.
>>
>>   
>>> I did realize an error i had in my haresources file however i had the 
>>> server name in there as primary, i should have had Joe rather than 
>>> stewie, i changed that just before i sent my email.... i prefer to use 
>>> the haresource file for config as i found the xml config a bit confusing 
>>> as i tried to use it....
>>>     
>>
>> Admittedly it has a rather steep learning curve and it takes some
>> time to get used to it. You could still create an initial XML
>> configuration using the conversion tool (see
>> http://www.linux-ha.org/ClusterInformationBase/Conversion).
>>
>>   
>>> I do not understand when you say i need 2 com links... you mean for 
>>> replication of of data, or for the heartbeat itself...?
>>>     
>>
>> For heartbeat. Otherwise you risk a split-brain which is a very
>> bad thing for clusters.
>>
>> Thanks,
>>
>> Dejan
>>
>>   
>>> I very much appreciate your input, thanks again...
>>>
>>> Rob Morin
>>> Dido Internet Inc.
>>> Montreal,Canada
>>> http://www.dido.ca
>>> 514-990-4444
>>>
>>>
>>>
>>> Dejan Muhamedagic wrote:
>>>     
>>>> Hi,
>>>>
>>>> On Mon, Apr 14, 2008 at 09:22:08AM -0400, Rob Morin wrote:
>>>>         
>>>>> Hello all my first post here so be gentle....  :)
>>>>>
>>>>> I have setup already DRBD and Heartbeat-2 on 2 Debian Etch servers. 
>>>>> Primary named Joe secondary named Stewie
>>>>> DRBD version 8 via apt-get and heartbeat-2 via apt-get version 2.0.7-2
>>>>>             
>>>> 2.0.7-2 is rather old. You would want to upgrade, in particular
>>>> if you run v2/crm style configurations.
>>>>
>>>>         
>>>>> I am using 2 NICS, eth0 which is private for DRBD replication and 
>>>>> heartbeat and eth1 used for my real public IP address where outsiders 
>>>>> connect to for the services.
>>>>>             
>>>> See below.
>>>>
>>>>         
>>>>> I am not using heartbeat yet, but i am using drbd, as i am having a 
>>>>> trouble getting heartbeat to take over on the secondary server(Stewie). 
>>>>> The problem is Apache is dying for some reason... however i would like 
>>>>> the other resources to start, such as pop and mail and a couple 
>>>>> others.. i figure its better to have only one server dead such as web , 
>>>>> rather than all services dead...
>>>>>
>>>>> My question is, is it possible to have heartbeat ignore a problem when 
>>>>> a problem or error occurs starting up a service?
>>>>>             
>>>> In v1 probably not, but other services/groups shouldn't be
>>>> affected. The point of having a service in a cluster is to make
>>>> it more available, right? So, if your service is unstable, it
>>>> should first be fixed.
>>>>
>>>>         
>>>>> As its is hard to troubleshoot a problem when it occurs as heartbeat 
>>>>> gives up if it encounters one error....
>>>>>             
>>>> Why should it be hard to troubleshoot? There are logs I guess.
>>>>
>>>>         
>>>>> Also i noticed in the in the ha.cf file ther is a comment that says "# 
>>>>> Node name must be same as uname -r."
>>>>>
>>>>> SO i have "Joe" and "Stewie" as my hostnames but if i do a uname -r on 
>>>>> either host i get this in return
>>>>>
>>>>> 2.6.18-6-amd64
>>>>>             
>>>> That must be a typo. It should read 'uname -n'.
>>>>
>>>>         
>>>>> Could this be an issue... here are my conf files....
>>>>>
>>>>>
>>>>> ha.cf file
>>>>> -------------------------------------
>>>>> logfacility     daemon        # This is deprecated
>>>>> keepalive 2                   # Interval between heartbeat (HB) 
>>>>> packets.
>>>>> deadtime 60                   # How quickly HB determines a dead node.
>>>>> warntime 5                    # Time HB will issue a late HB.
>>>>> initdead 120                  # Time delay needed by HB to report a 
>>>>> dead node.
>>>>> udpport 694                   # UDP port HB uses to communicate between 
>>>>> nodes.
>>>>> #ping 192.168.5.1              # Ping VMware Server host to simulate 
>>>>> network resource.
>>>>> bcast eth0
>>>>>             
>>>> You need at least two comm links for production servers. Another
>>>> link could be your public network interface.
>>>>
>>>>         
>>>>> #baud 115200
>>>>> #serial /dev/ttyS0              # Which interface to use for HB 
>>>>> packets.
>>>>> coredumps true
>>>>> auto_failback off             # Auto promotion of primary node upon 
>>>>> return to cluster.
>>>>> node    joe      # Node name must be same as uname -r.
>>>>> node    stewie      # Node name must be same as uname -r.
>>>>> ###
>>>>> respawn hacluster /usr/lib/heartbeat/ipfail
>>>>> # Specifies which programs to run at startup
>>>>>
>>>>> ------------------------------------------------------------
>>>>>
>>>>>
>>>>> haresources  file
>>>>> ------------------------------------------------------
>>>>> joe IPaddr::xxx.xxx.xxx.150 \
>>>>> drbddisk::mail 
>>>>> Filesystem::/dev/drbd0::/var/mail/virtual::ext3::defaults apache2 mysql 
>>>>> ispcp_daemon \
>>>>> drbddisk::web Filesystem::/dev/drbd1::/var/www::ext3::defaults postfix 
>>>>> courier-authdaemon courier-pop courier-imap
>>>>>             
>>>> Looks like you put everything in a single group. You should try
>>>> to split them into several, if possible. For example, I'd assume
>>>> that drbddisk::mail and drbddisk::web don't depend on each other
>>>> and that various services depend on either the former or the
>>>> latter. Then create at least two groups. If all depend on the
>>>> IP address, then all have to be in a single group if you're
>>>> running a v1/haresources based configuration. In that case, you
>>>> would want to consider a v2/crm configuration. At any rate, you
>>>> may consider introducing an extra IP address for the second group
>>>> of services.
>>>>
>>>> See http://linux-ha.org/LearningAboutHeartbeat,
>>>> http://linux-ha.org/HeartbeatTutorials, and
>>>> http://linux-ha.org/GettingStartedV2 for more information.
>>>>
>>>> HTH,
>>>>
>>>> Dejan
>>>>
>>>>         
>>>>> ----------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> Thanks to all for your help and have a great day!
>>>>>
>>>>> -- 
>>>>>
>>>>> Rob Morin
>>>>> Dido Internet Inc.
>>>>> Montreal,Canada
>>>>> http://www.dido.ca
>>>>> 514-990-4444
>>>>>
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> [email protected]
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>             
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> [email protected]
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>         
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>     
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>   
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to