Re: [Gluster-users] Questions on ganesha HA and shared storage size

Alessandro De Salvo Tue, 09 Jun 2015 01:36:28 -0700

Hi Soumya,

> Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri <[email protected]> ha 
> scritto:
> 
> 
> 
> On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:
>> OK, I found at least one of the bugs.
>> The /usr/libexec/ganesha/ganesha.sh has the following lines:
>> 
>>     if [ -e /etc/os-release ]; then
>>         RHEL6_PCS_CNAME_OPTION=""
>>     fi
>> 
>> This is OK for RHEL < 7, but does not work for >= 7. I have changed it to 
>> the following, to make it working:
>> 
>>     if [ -e /etc/os-release ]; then
>>         eval $(grep -F "REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
>>         [ "$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] && 
>> RHEL6_PCS_CNAME_OPTION=""
>>     fi
>> 
> Oh..Thanks for the fix. Could you please file a bug for the same (and 
> probably submit your fix as well). We shall have it corrected.


Just did it, https://bugzilla.redhat.com/show_bug.cgi?id=1229601

> 
>> Apart from that, the VIP_<node> I was using were wrong, and I should have 
>> converted all the “-“ to underscores, maybe this could be mentioned in the 
>> documentation when you will have it ready.
>> Now, the cluster starts, but the VIPs apparently not:
>> 
> Sure. Thanks again for pointing it out. We shall make a note of it.
> 
>> Online: [ atlas-node1 atlas-node2 ]
>> 
>> Full list of resources:
>> 
>>  Clone Set: nfs-mon-clone [nfs-mon]
>>      Started: [ atlas-node1 atlas-node2 ]
>>  Clone Set: nfs-grace-clone [nfs-grace]
>>      Started: [ atlas-node1 atlas-node2 ]
>>  atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
>>  atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
>>  atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):        Stopped
>>  atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
>>  atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1
>>  atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2
>> 
>> PCSD Status:
>>   atlas-node1: Online
>>   atlas-node2: Online
>> 
>> Daemon Status:
>>   corosync: active/disabled
>>   pacemaker: active/disabled
>>   pcsd: active/enabled
>> 
>> 
> Here corosync and pacemaker shows 'disabled' state. Can you check the status 
> of their services. They should be running prior to cluster creation. We need 
> to include that step in document as well.

Ah, OK, you’re right, I have added it to my puppet modules (we install and 
configure ganesha via puppet, I’ll put the module on puppetforge soon, in case 
anyone is interested).

> 
>> But the issue that is puzzling me more is the following:
>> 
>> # showmount -e localhost
>> rpc mount export: RPC: Timed out
>> 
>> And when I try to enable the ganesha exports on a volume I get this error:
>> 
>> # gluster volume set atlas-home-01 ganesha.enable on
>> volume set: failed: Failed to create NFS-Ganesha export config file.
>> 
>> But I see the file created in /etc/ganesha/exports/*.conf
>> Still, showmount hangs and times out.
>> Any help?
>> Thanks,
>> 
> Hmm that's strange. Sometimes, in case if there was no proper cleanup done 
> while trying to re-create the cluster, we have seen such issues.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1227709
> 
> http://review.gluster.org/#/c/11093/
> 
> Can you please unexport all the volumes, teardown the cluster using
> 'gluster vol set <volname> ganesha.enable off’

OK:

# gluster vol set atlas-home-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.

# gluster vol set atlas-data-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.


> 'gluster ganesha disable' command.

I’m assuming you wanted to write nfs-ganesha instead?

# gluster nfs-ganesha disable
ganesha enable : success


A side note (not really important): it’s strange that when I do a disable the 
message is “ganesha enable” :-)

> 
> Verify if the following files have been deleted on all the nodes-
> '/etc/cluster/cluster.conf’

this file is not present at all, I think it’s not needed in CentOS 7

> '/etc/ganesha/ganesha.conf’,

it’s still there, but empty, and I guess it should be OK, right?

> '/etc/ganesha/exports/*’

no more files there

> '/var/lib/pacemaker/cib’

it’s empty

> 
> Verify if the ganesha service is stopped on all the nodes.

nope, it’s still running, I will stop it.

> 
> start/restart the services - corosync, pcs.

In the node where I issued the nfs-ganesha disable there is no more any 
/etc/corosync/corosync.conf so corosync won’t start. The other node instead 
still has the file, it’s strange.

> 
> And re-try the HA cluster creation
> 'gluster ganesha enable’

This time (repeated twice) it did not work at all:

# pcs status
Cluster name: ATLAS_GANESHA_01
Last updated: Tue Jun  9 10:13:43 2015
Last change: Tue Jun  9 10:13:22 2015
Stack: corosync
Current DC: atlas-node1 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
6 Resources configured


Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ atlas-node1 atlas-node2 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ atlas-node1 atlas-node2 ]
 atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node1 
 atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy): Started atlas-node2 

PCSD Status:
  atlas-node1: Online
  atlas-node2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled



I tried then "pcs cluster destroy" on both nodes, and then again nfs-ganesha 
enable, but now I’m back to the old problem:

# pcs status
Cluster name: ATLAS_GANESHA_01
Last updated: Tue Jun  9 10:22:27 2015
Last change: Tue Jun  9 10:17:00 2015
Stack: corosync
Current DC: atlas-node2 (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
10 Resources configured


Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ atlas-node1 atlas-node2 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ atlas-node1 atlas-node2 ]
 atlas-node1-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped 
 atlas-node1-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node1 
 atlas-node2-cluster_ip-1       (ocf::heartbeat:IPaddr):        Stopped 
 atlas-node2-trigger_ip-1       (ocf::heartbeat:Dummy): Started atlas-node2 
 atlas-node1-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1 
 atlas-node2-dead_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2 

PCSD Status:
  atlas-node1: Online
  atlas-node2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


Cheers,

        Alessandro

> 
> 
> Thanks,
> Soumya
> 
>>      Alessandro
>> 
>>> Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo 
>>> <[email protected]> ha scritto:
>>> 
>>> Hi,
>>> indeed, it does not work :-)
>>> OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs 
>>> 3.7.1 and nfs-ganesha 2.2.0:
>>> 
>>> 1) ensured that the machines are able to resolve their IPs (but this was 
>>> already true since they were in the DNS);
>>> 2) disabled NetworkManager and enabled network on both machines;
>>> 3) created a gluster shared volume 'gluster_shared_storage' and mounted it 
>>> on '/run/gluster/shared_storage' on all the cluster nodes using glusterfs 
>>> native mount (on CentOS 7.1 there is a link by default /var/run -> ../run)
>>> 4) created an empty /etc/ganesha/ganesha.conf;
>>> 5) installed pacemaker pcs resource-agents corosync on all cluster machines;
>>> 6) set the ‘hacluster’ user the same password on all machines;
>>> 7) pcs cluster auth <hostname> -u hacluster -p <pass> on all the nodes (on 
>>> both nodes I issued the commands for both nodes)
>>> 8) IPv6 is configured by default on all nodes, although the infrastructure 
>>> is not ready for IPv6
>>> 9) enabled pcsd and started it on all nodes
>>> 10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one 
>>> per machine:
>>> 
>>> 
>>> ===> atlas-node1
>>> # Name of the HA cluster created.
>>> HA_NAME="ATLAS_GANESHA_01"
>>> # The server from which you intend to mount
>>> # the shared volume.
>>> HA_VOL_SERVER=“atlas-node1"
>>> # The subset of nodes of the Gluster Trusted Pool
>>> # that forms the ganesha HA cluster. IP/Hostname
>>> # is specified.
>>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
>>> # Virtual IPs of each of the nodes specified above.
>>> VIP_atlas-node1=“x.x.x.1"
>>> VIP_atlas-node2=“x.x.x.2"
>>> 
>>> ===> atlas-node2
>>> # Name of the HA cluster created.
>>> HA_NAME="ATLAS_GANESHA_01"
>>> # The server from which you intend to mount
>>> # the shared volume.
>>> HA_VOL_SERVER=“atlas-node2"
>>> # The subset of nodes of the Gluster Trusted Pool
>>> # that forms the ganesha HA cluster. IP/Hostname
>>> # is specified.
>>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2"
>>> # Virtual IPs of each of the nodes specified above.
>>> VIP_atlas-node1=“x.x.x.1"
>>> VIP_atlas-node2=“x.x.x.2”
>>> 
>>> 11) issued gluster nfs-ganesha enable, but it fails with a cryptic message:
>>> 
>>> # gluster nfs-ganesha enable
>>> Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted 
>>> pool. Do you still want to continue? (y/n) y
>>> nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha. Please 
>>> check the log file for details
>>> 
>>> Looking at the logs I found nothing really special but this:
>>> 
>>> ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
>>> [2015-06-08 17:57:15.672844] I [MSGID: 106132] 
>>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already 
>>> stopped
>>> [2015-06-08 17:57:15.675395] I [glusterd-ganesha.c:386:check_host_list] 
>>> 0-management: ganesha host found Hostname is atlas-node2
>>> [2015-06-08 17:57:15.720692] I [glusterd-ganesha.c:386:check_host_list] 
>>> 0-management: ganesha host found Hostname is atlas-node2
>>> [2015-06-08 17:57:15.721161] I [glusterd-ganesha.c:335:is_ganesha_host] 
>>> 0-management: ganesha host found Hostname is atlas-node2
>>> [2015-06-08 17:57:16.633048] E 
>>> [glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management: Initial 
>>> NFS-Ganesha set up failed
>>> [2015-06-08 17:57:16.641563] E [glusterd-syncop.c:1396:gd_commit_op_phase] 
>>> 0-management: Commit of operation 'Volume (null)' failed on localhost : 
>>> Failed to set up HA config for NFS-Ganesha. Please check the log file for 
>>> details
>>> 
>>> ==> /var/log/glusterfs/cmd_history.log <==
>>> [2015-06-08 17:57:16.643615]  : nfs-ganesha enable : FAILED : Failed to set 
>>> up HA config for NFS-Ganesha. Please check the log file for details
>>> 
>>> ==> /var/log/glusterfs/cli.log <==
>>> [2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting with: -1
>>> 
>>> 
>>> Also, pcs seems to be fine for the auth part, although it obviously tells 
>>> me the cluster is not running.
>>> 
>>> I, [2015-06-08T19:57:16.305323 #7223]  INFO -- : Running: 
>>> /usr/sbin/corosync-cmapctl totem.cluster_name
>>> I, [2015-06-08T19:57:16.345457 #7223]  INFO -- : Running: /usr/sbin/pcs 
>>> cluster token-nodes
>>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET /remote/check_auth 
>>> HTTP/1.1" 200 68 0.1919
>>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET /remote/check_auth 
>>> HTTP/1.1" 200 68 0.1920
>>> atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] "GET 
>>> /remote/check_auth HTTP/1.1" 200 68
>>> - -> /remote/check_auth
>>> 
>>> 
>>> What am I doing wrong?
>>> Thanks,
>>> 
>>>     Alessandro
>>> 
>>>> Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri <[email protected]> 
>>>> ha scritto:
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:
>>>>> Sorry, just another question:
>>>>> 
>>>>> - in my installation of gluster 3.7.1 the command gluster 
>>>>> features.ganesha enable does not work:
>>>>> 
>>>>> # gluster features.ganesha enable
>>>>> unrecognized word: features.ganesha (position 0)
>>>>> 
>>>>> Which version has full support for it?
>>>> 
>>>> Sorry. This option has recently been changed. It is now
>>>> 
>>>> $ gluster nfs-ganesha enable
>>>> 
>>>> 
>>>>> 
>>>>> - in the documentation the ccs and cman packages are required, but they 
>>>>> seems not to be available anymore on CentOS 7 and similar, I guess they 
>>>>> are not really required anymore, as pcs should do the full job
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>>   Alessandro
>>>> 
>>>> Looks like so from http://clusterlabs.org/quickstart-redhat.html. Let us 
>>>> know if it doesn't work.
>>>> 
>>>> Thanks,
>>>> Soumya
>>>> 
>>>>> 
>>>>>> Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo 
>>>>>> <[email protected]> ha scritto:
>>>>>> 
>>>>>> Great, many thanks Soumya!
>>>>>> Cheers,
>>>>>> 
>>>>>>  Alessandro
>>>>>> 
>>>>>>> Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri 
>>>>>>> <[email protected]> ha scritto:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Please find the slides of the demo video at [1]
>>>>>>> 
>>>>>>> We recommend to have a distributed replica volume as a shared volume 
>>>>>>> for better data-availability.
>>>>>>> 
>>>>>>> Size of the volume depends on the workload you may have. Since it is 
>>>>>>> used to maintain states of NLM/NFSv4 clients, you may calculate the 
>>>>>>> size of the volume to be minimum of aggregate of
>>>>>>> (typical_size_of'/var/lib/nfs'_directory + 
>>>>>>> ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
>>>>>>> 
>>>>>>> We shall document about this feature sooner in the gluster docs as well.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Soumya
>>>>>>> 
>>>>>>> [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846
>>>>>>> 
>>>>>>> On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
>>>>>>>> Hi,
>>>>>>>> I have seen the demo video on ganesha HA, 
>>>>>>>> https://www.youtube.com/watch?v=Z4mvTQC-efM
>>>>>>>> However there is no advice on the appropriate size of the shared 
>>>>>>>> volume. How is it really used, and what should be a reasonable size 
>>>>>>>> for it?
>>>>>>>> Also, are the slides from the video available somewhere, as well as a 
>>>>>>>> documentation on all this? I did not manage to find them.
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>>        Alessandro
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> [email protected]
>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> [email protected]
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Questions on ganesha HA and shared storage size

Reply via email to