Hi Soumya, > Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri <[email protected]> ha > scritto: > > > > On 06/09/2015 01:31 AM, Alessandro De Salvo wrote: >> OK, I found at least one of the bugs. >> The /usr/libexec/ganesha/ganesha.sh has the following lines: >> >> if [ -e /etc/os-release ]; then >> RHEL6_PCS_CNAME_OPTION="" >> fi >> >> This is OK for RHEL < 7, but does not work for >= 7. I have changed it to >> the following, to make it working: >> >> if [ -e /etc/os-release ]; then >> eval $(grep -F "REDHAT_SUPPORT_PRODUCT=" /etc/os-release) >> [ "$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] && >> RHEL6_PCS_CNAME_OPTION="" >> fi >> > Oh..Thanks for the fix. Could you please file a bug for the same (and > probably submit your fix as well). We shall have it corrected.
Just did it, https://bugzilla.redhat.com/show_bug.cgi?id=1229601 > >> Apart from that, the VIP_<node> I was using were wrong, and I should have >> converted all the “-“ to underscores, maybe this could be mentioned in the >> documentation when you will have it ready. >> Now, the cluster starts, but the VIPs apparently not: >> > Sure. Thanks again for pointing it out. We shall make a note of it. > >> Online: [ atlas-node1 atlas-node2 ] >> >> Full list of resources: >> >> Clone Set: nfs-mon-clone [nfs-mon] >> Started: [ atlas-node1 atlas-node2 ] >> Clone Set: nfs-grace-clone [nfs-grace] >> Started: [ atlas-node1 atlas-node2 ] >> atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped >> atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 >> atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped >> atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 >> atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 >> atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 >> >> PCSD Status: >> atlas-node1: Online >> atlas-node2: Online >> >> Daemon Status: >> corosync: active/disabled >> pacemaker: active/disabled >> pcsd: active/enabled >> >> > Here corosync and pacemaker shows 'disabled' state. Can you check the status > of their services. They should be running prior to cluster creation. We need > to include that step in document as well. Ah, OK, you’re right, I have added it to my puppet modules (we install and configure ganesha via puppet, I’ll put the module on puppetforge soon, in case anyone is interested). > >> But the issue that is puzzling me more is the following: >> >> # showmount -e localhost >> rpc mount export: RPC: Timed out >> >> And when I try to enable the ganesha exports on a volume I get this error: >> >> # gluster volume set atlas-home-01 ganesha.enable on >> volume set: failed: Failed to create NFS-Ganesha export config file. >> >> But I see the file created in /etc/ganesha/exports/*.conf >> Still, showmount hangs and times out. >> Any help? >> Thanks, >> > Hmm that's strange. Sometimes, in case if there was no proper cleanup done > while trying to re-create the cluster, we have seen such issues. > > https://bugzilla.redhat.com/show_bug.cgi?id=1227709 > > http://review.gluster.org/#/c/11093/ > > Can you please unexport all the volumes, teardown the cluster using > 'gluster vol set <volname> ganesha.enable off’ OK: # gluster vol set atlas-home-01 ganesha.enable off volume set: failed: ganesha.enable is already 'off'. # gluster vol set atlas-data-01 ganesha.enable off volume set: failed: ganesha.enable is already 'off'. > 'gluster ganesha disable' command. I’m assuming you wanted to write nfs-ganesha instead? # gluster nfs-ganesha disable ganesha enable : success A side note (not really important): it’s strange that when I do a disable the message is “ganesha enable” :-) > > Verify if the following files have been deleted on all the nodes- > '/etc/cluster/cluster.conf’ this file is not present at all, I think it’s not needed in CentOS 7 > '/etc/ganesha/ganesha.conf’, it’s still there, but empty, and I guess it should be OK, right? > '/etc/ganesha/exports/*’ no more files there > '/var/lib/pacemaker/cib’ it’s empty > > Verify if the ganesha service is stopped on all the nodes. nope, it’s still running, I will stop it. > > start/restart the services - corosync, pcs. In the node where I issued the nfs-ganesha disable there is no more any /etc/corosync/corosync.conf so corosync won’t start. The other node instead still has the file, it’s strange. > > And re-try the HA cluster creation > 'gluster ganesha enable’ This time (repeated twice) it did not work at all: # pcs status Cluster name: ATLAS_GANESHA_01 Last updated: Tue Jun 9 10:13:43 2015 Last change: Tue Jun 9 10:13:22 2015 Stack: corosync Current DC: atlas-node1 (1) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 6 Resources configured Online: [ atlas-node1 atlas-node2 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ atlas-node1 atlas-node2 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ atlas-node1 atlas-node2 ] atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 PCSD Status: atlas-node1: Online atlas-node2: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled I tried then "pcs cluster destroy" on both nodes, and then again nfs-ganesha enable, but now I’m back to the old problem: # pcs status Cluster name: ATLAS_GANESHA_01 Last updated: Tue Jun 9 10:22:27 2015 Last change: Tue Jun 9 10:17:00 2015 Stack: corosync Current DC: atlas-node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 10 Resources configured Online: [ atlas-node1 atlas-node2 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ atlas-node1 atlas-node2 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ atlas-node1 atlas-node2 ] atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 PCSD Status: atlas-node1: Online atlas-node2: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled Cheers, Alessandro > > > Thanks, > Soumya > >> Alessandro >> >>> Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo >>> <[email protected]> ha scritto: >>> >>> Hi, >>> indeed, it does not work :-) >>> OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs >>> 3.7.1 and nfs-ganesha 2.2.0: >>> >>> 1) ensured that the machines are able to resolve their IPs (but this was >>> already true since they were in the DNS); >>> 2) disabled NetworkManager and enabled network on both machines; >>> 3) created a gluster shared volume 'gluster_shared_storage' and mounted it >>> on '/run/gluster/shared_storage' on all the cluster nodes using glusterfs >>> native mount (on CentOS 7.1 there is a link by default /var/run -> ../run) >>> 4) created an empty /etc/ganesha/ganesha.conf; >>> 5) installed pacemaker pcs resource-agents corosync on all cluster machines; >>> 6) set the ‘hacluster’ user the same password on all machines; >>> 7) pcs cluster auth <hostname> -u hacluster -p <pass> on all the nodes (on >>> both nodes I issued the commands for both nodes) >>> 8) IPv6 is configured by default on all nodes, although the infrastructure >>> is not ready for IPv6 >>> 9) enabled pcsd and started it on all nodes >>> 10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one >>> per machine: >>> >>> >>> ===> atlas-node1 >>> # Name of the HA cluster created. >>> HA_NAME="ATLAS_GANESHA_01" >>> # The server from which you intend to mount >>> # the shared volume. >>> HA_VOL_SERVER=“atlas-node1" >>> # The subset of nodes of the Gluster Trusted Pool >>> # that forms the ganesha HA cluster. IP/Hostname >>> # is specified. >>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2" >>> # Virtual IPs of each of the nodes specified above. >>> VIP_atlas-node1=“x.x.x.1" >>> VIP_atlas-node2=“x.x.x.2" >>> >>> ===> atlas-node2 >>> # Name of the HA cluster created. >>> HA_NAME="ATLAS_GANESHA_01" >>> # The server from which you intend to mount >>> # the shared volume. >>> HA_VOL_SERVER=“atlas-node2" >>> # The subset of nodes of the Gluster Trusted Pool >>> # that forms the ganesha HA cluster. IP/Hostname >>> # is specified. >>> HA_CLUSTER_NODES=“atlas-node1,atlas-node2" >>> # Virtual IPs of each of the nodes specified above. >>> VIP_atlas-node1=“x.x.x.1" >>> VIP_atlas-node2=“x.x.x.2” >>> >>> 11) issued gluster nfs-ganesha enable, but it fails with a cryptic message: >>> >>> # gluster nfs-ganesha enable >>> Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted >>> pool. Do you still want to continue? (y/n) y >>> nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha. Please >>> check the log file for details >>> >>> Looking at the logs I found nothing really special but this: >>> >>> ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <== >>> [2015-06-08 17:57:15.672844] I [MSGID: 106132] >>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already >>> stopped >>> [2015-06-08 17:57:15.675395] I [glusterd-ganesha.c:386:check_host_list] >>> 0-management: ganesha host found Hostname is atlas-node2 >>> [2015-06-08 17:57:15.720692] I [glusterd-ganesha.c:386:check_host_list] >>> 0-management: ganesha host found Hostname is atlas-node2 >>> [2015-06-08 17:57:15.721161] I [glusterd-ganesha.c:335:is_ganesha_host] >>> 0-management: ganesha host found Hostname is atlas-node2 >>> [2015-06-08 17:57:16.633048] E >>> [glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management: Initial >>> NFS-Ganesha set up failed >>> [2015-06-08 17:57:16.641563] E [glusterd-syncop.c:1396:gd_commit_op_phase] >>> 0-management: Commit of operation 'Volume (null)' failed on localhost : >>> Failed to set up HA config for NFS-Ganesha. Please check the log file for >>> details >>> >>> ==> /var/log/glusterfs/cmd_history.log <== >>> [2015-06-08 17:57:16.643615] : nfs-ganesha enable : FAILED : Failed to set >>> up HA config for NFS-Ganesha. Please check the log file for details >>> >>> ==> /var/log/glusterfs/cli.log <== >>> [2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting with: -1 >>> >>> >>> Also, pcs seems to be fine for the auth part, although it obviously tells >>> me the cluster is not running. >>> >>> I, [2015-06-08T19:57:16.305323 #7223] INFO -- : Running: >>> /usr/sbin/corosync-cmapctl totem.cluster_name >>> I, [2015-06-08T19:57:16.345457 #7223] INFO -- : Running: /usr/sbin/pcs >>> cluster token-nodes >>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET /remote/check_auth >>> HTTP/1.1" 200 68 0.1919 >>> ::ffff:141.108.38.46 - - [08/Jun/2015 19:57:16] "GET /remote/check_auth >>> HTTP/1.1" 200 68 0.1920 >>> atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] "GET >>> /remote/check_auth HTTP/1.1" 200 68 >>> - -> /remote/check_auth >>> >>> >>> What am I doing wrong? >>> Thanks, >>> >>> Alessandro >>> >>>> Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri <[email protected]> >>>> ha scritto: >>>> >>>> >>>> >>>> >>>> On 06/08/2015 08:20 PM, Alessandro De Salvo wrote: >>>>> Sorry, just another question: >>>>> >>>>> - in my installation of gluster 3.7.1 the command gluster >>>>> features.ganesha enable does not work: >>>>> >>>>> # gluster features.ganesha enable >>>>> unrecognized word: features.ganesha (position 0) >>>>> >>>>> Which version has full support for it? >>>> >>>> Sorry. This option has recently been changed. It is now >>>> >>>> $ gluster nfs-ganesha enable >>>> >>>> >>>>> >>>>> - in the documentation the ccs and cman packages are required, but they >>>>> seems not to be available anymore on CentOS 7 and similar, I guess they >>>>> are not really required anymore, as pcs should do the full job >>>>> >>>>> Thanks, >>>>> >>>>> Alessandro >>>> >>>> Looks like so from http://clusterlabs.org/quickstart-redhat.html. Let us >>>> know if it doesn't work. >>>> >>>> Thanks, >>>> Soumya >>>> >>>>> >>>>>> Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo >>>>>> <[email protected]> ha scritto: >>>>>> >>>>>> Great, many thanks Soumya! >>>>>> Cheers, >>>>>> >>>>>> Alessandro >>>>>> >>>>>>> Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri >>>>>>> <[email protected]> ha scritto: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Please find the slides of the demo video at [1] >>>>>>> >>>>>>> We recommend to have a distributed replica volume as a shared volume >>>>>>> for better data-availability. >>>>>>> >>>>>>> Size of the volume depends on the workload you may have. Since it is >>>>>>> used to maintain states of NLM/NFSv4 clients, you may calculate the >>>>>>> size of the volume to be minimum of aggregate of >>>>>>> (typical_size_of'/var/lib/nfs'_directory + >>>>>>> ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point) >>>>>>> >>>>>>> We shall document about this feature sooner in the gluster docs as well. >>>>>>> >>>>>>> Thanks, >>>>>>> Soumya >>>>>>> >>>>>>> [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846 >>>>>>> >>>>>>> On 06/08/2015 04:34 PM, Alessandro De Salvo wrote: >>>>>>>> Hi, >>>>>>>> I have seen the demo video on ganesha HA, >>>>>>>> https://www.youtube.com/watch?v=Z4mvTQC-efM >>>>>>>> However there is no advice on the appropriate size of the shared >>>>>>>> volume. How is it really used, and what should be a reasonable size >>>>>>>> for it? >>>>>>>> Also, are the slides from the video available somewhere, as well as a >>>>>>>> documentation on all this? I did not manage to find them. >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Alessandro >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> [email protected] >>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>> >>>>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> [email protected] >>> http://www.gluster.org/mailman/listinfo/gluster-users >>
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
