Re: [Linux-cluster] problems with clvmd and lvms on rhel6.1

Poós Krisztián Fri, 10 Aug 2012 13:26:25 -0700

Yeah, Thanks. I checked your thread...if you  ment "clvmd hangs" however
It's like not finished... I see only 3 entries for that thread and
unfortunately no solution at the end. May I miss something?
However my scenario is a bit different, I don't need gfs, but only clvmd
with a failover lvm, as this is an active/passive configuration. And my
clvmd is rarely hanging, but my main problem that all the volumes remain
inactive.


On 08/10/2012 07:00 PM, Chip Burke wrote:
> See my thread earlier as I am having similar issues. I am testing this
> soon, but I "think" the issue in my case is setting up SCSI fencing before
> GFS2. So essentially it has nothing to fence off of, sees it as a fault,
> and never recovers. I "think" my fix will be establish the LVMs, GFS2 etc
> then put in the SCSI fence so that it can actually create the private
> reservations. Then the fun begins in pulling the plug randomly to see how
> it behaves.
> ________________________________________
> Chip Burke
> 
> 
> 
> 
> 
> 
> 
> On 8/10/12 12:46 PM, "Digimer" <li...@alteeve.ca> wrote:
> 
>> Not sure if it relates, but I can say that without fencing, things will
>> break in strange ways. The reason is that if anything triggers a fault,
>> the cluster blocks by design and stays blocked until a fence call
>> succeeds (which is impossible without fencing configured in the first
>> place).
>>
>> Can you please setup fencing, test to make sure it works (using
>> 'fence_node rhel2.local' from rhel1.local, then in reverse)? Once this
>> is done, test again for your problem. If it still exists, please paste
>> the updated cluster.conf then. Also please include syslog from both
>> nodes around the time of your LVM tests.
>>
>> digimer
>>
>> On 08/10/2012 12:38 PM, Poós Krisztián wrote:
>>> This is the cluster conf, Which is a clone of the problematic system on
>>> a test environment (without the ORacle and SAP instances, only focusing
>>> on this LVM issue, with an LVM resource)
>>>
>>> [root@rhel2 ~]# cat /etc/cluster/cluster.conf
>>> <?xml version="1.0"?>
>>> <cluster config_version="7" name="teszt">
>>>     <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>     <clusternodes>
>>>             <clusternode name="rhel1.local" nodeid="1" votes="1">
>>>                     <fence/>
>>>             </clusternode>
>>>             <clusternode name="rhel2.local" nodeid="2" votes="1">
>>>                     <fence/>
>>>             </clusternode>
>>>     </clusternodes>
>>>     <cman expected_votes="3"/>
>>>     <fencedevices/>
>>>     <rm>
>>>             <failoverdomains>
>>>                     <failoverdomain name="all" nofailback="1" ordered="1" 
>>> restricted="0">
>>>                             <failoverdomainnode name="rhel1.local" 
>>> priority="1"/>
>>>                             <failoverdomainnode name="rhel2.local" 
>>> priority="2"/>
>>>                     </failoverdomain>
>>>             </failoverdomains>
>>>             <resources>
>>>                     <lvm lv_name="teszt-lv" name="teszt-lv" 
>>> vg_name="teszt"/>
>>>                     <fs device="/dev/teszt/teszt-lv" fsid="43679" 
>>> fstype="ext4"
>>> mountpoint="/lvm" name="teszt-fs"/>
>>>             </resources>
>>>             <service autostart="1" domain="all" exclusive="0" name="teszt"
>>> recovery="disable">
>>>                     <lvm ref="teszt-lv"/>
>>>                     <fs ref="teszt-fs"/>
>>>             </service>
>>>     </rm>
>>>     <quorumd label="qdisk"/>
>>> </cluster>
>>>
>>> Here are the log parts:
>>> Aug 10 17:21:21 rgmanager I am node #2
>>> Aug 10 17:21:22 rgmanager Resource Group Manager Starting
>>> Aug 10 17:21:22 rgmanager Loading Service Data
>>> Aug 10 17:21:29 rgmanager Initializing Services
>>> Aug 10 17:21:31 rgmanager /dev/dm-2 is not mounted
>>> Aug 10 17:21:31 rgmanager Services Initialized
>>> Aug 10 17:21:31 rgmanager State change: Local UP
>>> Aug 10 17:21:31 rgmanager State change: rhel1.local UP
>>> Aug 10 17:23:23 rgmanager Starting stopped service service:teszt
>>> Aug 10 17:23:25 rgmanager Failed to activate logical volume,
>>> teszt/teszt-lv
>>> Aug 10 17:23:25 rgmanager Attempting cleanup of teszt/teszt-lv
>>> Aug 10 17:23:29 rgmanager Failed second attempt to activate
>>> teszt/teszt-lv
>>> Aug 10 17:23:29 rgmanager start on lvm "teszt-lv" returned 1 (generic
>>> error)
>>> Aug 10 17:23:29 rgmanager #68: Failed to start service:teszt; return
>>> value: 1
>>> Aug 10 17:23:29 rgmanager Stopping service service:teszt
>>> Aug 10 17:23:30 rgmanager stop: Could not match /dev/teszt/teszt-lv with
>>> a real device
>>> Aug 10 17:23:30 rgmanager stop on fs "teszt-fs" returned 2 (invalid
>>> argument(s))
>>> Aug 10 17:23:31 rgmanager #12: RG service:teszt failed to stop;
>>> intervention required
>>> Aug 10 17:23:31 rgmanager Service service:teszt is failed
>>> Aug 10 17:24:09 rgmanager #43: Service service:teszt has failed; can not
>>> start.
>>> Aug 10 17:24:09 rgmanager #13: Service service:teszt failed to stop
>>> cleanly
>>> Aug 10 17:25:12 rgmanager Starting stopped service service:teszt
>>> Aug 10 17:25:14 rgmanager Failed to activate logical volume,
>>> teszt/teszt-lv
>>> Aug 10 17:25:15 rgmanager Attempting cleanup of teszt/teszt-lv
>>> Aug 10 17:25:17 rgmanager Failed second attempt to activate
>>> teszt/teszt-lv
>>> Aug 10 17:25:18 rgmanager start on lvm "teszt-lv" returned 1 (generic
>>> error)
>>> Aug 10 17:25:18 rgmanager #68: Failed to start service:teszt; return
>>> value: 1
>>> Aug 10 17:25:18 rgmanager Stopping service service:teszt
>>> Aug 10 17:25:19 rgmanager stop: Could not match /dev/teszt/teszt-lv with
>>> a real device
>>> Aug 10 17:25:19 rgmanager stop on fs "teszt-fs" returned 2 (invalid
>>> argument(s))
>>>
>>>
>>> After I manually started the lvm on node1 and tried to switch it on
>>> node2 it's not able to start it.
>>>
>>> Regards,
>>> Krisztian
>>>
>>>
>>> On 08/10/2012 05:15 PM, Digimer wrote:
>>>> On 08/10/2012 11:07 AM, Poós Krisztián wrote:
>>>>> Dear all,
>>>>>
>>>>> I hope that anyone run into this problem in the past, so maybe can
>>>>> help
>>>>> me resolving this issue.
>>>>>
>>>>> There is a 2 node rhel cluster with quorum also.
>>>>> There are clustered lvms, where the -c- flag is on.
>>>>> If I start clvmd all the clustered lvms became online.
>>>>>
>>>>> After this if I start rgmanager, it deactivates all the volumes, and
>>>>> not
>>>>> able to activate them anymore as there are no such devices anymore
>>>>> during the startup of the service, so after this, the service fails.
>>>>> All lvs remain without the active flag.
>>>>>
>>>>> I can manually bring it up, but only if after clvmd is started, I set
>>>>> the lvms manually offline by the lvchange -an <lv>
>>>>> After this, when I start rgmanager, it can take it online without
>>>>> problems. However I think, this action should be done by the rgmanager
>>>>> itself. All the logs is full with the next:
>>>>> rgmanager Making resilient: lvchange -an ....
>>>>> rgmanager lv_exec_resilient failed
>>>>> rgmanager lv_activate_resilient stop failed on ....
>>>>>
>>>>> As well, sometimes the lvs/clvmd commands are also hanging. I have to
>>>>> restart clvmd to make it work again. (sometimes killing it)
>>>>>
>>>>> Anyone has any idea, what to check?
>>>>>
>>>>> Thanks and regards,
>>>>> Krisztian
>>>>
>>>> Please paste your cluster.conf file with minimal edits.
>>
>>
>> -- 
>> Digimer
>> Papers and Projects: https://alteeve.com
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

smime.p7s
Description: S/MIME Cryptographic Signature

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] problems with clvmd and lvms on rhel6.1

Reply via email to