Hi, On Fri, Apr 23, 2010 at 03:05:31PM -0700, Sid Stuart wrote: > This is a tricky problem. I think I have tracked it down, but am still > working to confirm as I don't understand the steps pacemaker goes through to > start a group. > > My understanding of how this works from the logs is, > > Pacemaker calls "exportfs monitor" as part of an initialization process.
It does resources probing. > It > does this before the preceding RA's in the group are started. The group is > question consists of, > > Resource Group: shared > shared_fs (ocf::heartbeat:Filesystem) Started > shared_ip (ocf::heartbeat:IPaddr2) Started > shared_export (ocf::heartbeat:exportfs) Started > > Before the exportfs script calls the export_monitor procedure though, it > calls the exportfs_validate procedure which verifies the existence of the > directory to be exported. Since the Filesystem RA has not been started yet, > the directory /data has not been mounted and /data/shared does not exist. validate checks if it is being probed and doesn't test if the directory exists in that case. > When the directory does not exist, exportfs_validate exits with > $OCF_ERR_ARGS, which equals 2. This causes the error message, This is not true, at least with the RA which is now in the repository. Perhaps you need to update the resource agent. Thanks, Dejan > Apr 23 13:49:58 kong102.kongregate.com pengine: [467]: debug: unpack_rsc_op: > shared_export_monitor_0 on kong103.kongregate.com returned 2 (invalid > parameter) instead of the expected value: 7 (not running) > > It also causes Pacemaker to give up on the exportfs script as a bad job. > Once the Filesystem RA has been run though, one can run "crm resource > reprobe " on the active node and it will properly export the /data/shared. > > The solution to the problem is to move the call to exportfs_validate below > the call to exportfs_monitor. I lean towards just putting it in the > exportfs_start procedure as that is the only procedure that really needs the > directory to exist. > > Are there other opinions? > Sid > > > > > On Fri, Apr 23, 2010 at 9:32 AM, Sid Stuart <[email protected]> wrote: > > > Hi Florian, > > > > I found the problem. The issue is with the RA-exportfs > > *directory*parameter. If "/data" is configured as the directory it works. > > If > > "/data/shared" is configured, it fails. Since the Linux exportfs command > > will accept "/data/shared" as a valid argument, this is either a bug in the > > documentation or the script. > > > > I will do some more testing and see if I can narrow down where the failure > > occurs in the script. > > > > Thanks for the help, > > Sid > > > > > > On Fri, Apr 23, 2010 at 3:34 AM, Florian Haas > > <[email protected]>wrote: > > > >> ha-log.txt is empty, messages doesn't contain a single instance of ERROR > >> (on either host), crm_mon.txt doesn't show any failures. Where's the > >> problem? > >> > >> Cheers, > >> Florian > >> > >> On 2010-04-22 18:02, Sid Stuart wrote: > >> > My apologies Florian, > >> > > >> > For this one, I stopped Heartbeat after 8 AM, cleared the log files, > >> > started Heartbeat and then ran > >> > > >> > hb_report -f 8:00 -u root /tmp/exportfs2 > >> > > >> > Hopefully, this will have the error. > >> > > >> > Sid > >> > > >> > On Thu, Apr 22, 2010 at 12:51 AM, Florian Haas <[email protected] > >> > <mailto:[email protected]>> wrote: > >> > > >> > The ha-log.txt of that tarball contains a single instance of ERROR: > >> > > >> > Apr 21 13:06:51 kong102.kongregate.com > >> > <http://kong102.kongregate.com> crmd: [11406]: ERROR: > >> > verify_stopped: Resource shared_export was active at shutdown. You > >> may > >> > ignore this error if it is unmanaged. > >> > > >> > That's a follow-up issue where Pacemaker has already decided that > >> there > >> > has been a problem. Please recreate that hb_report where we can > >> actually > >> > see the error happening. > >> > > >> > Unless the file attachment issue has been fixed in the meantime, > >> feel > >> > free to send that hb_report tarball directly to my personal address > >> > again. Thanks! > >> > > >> > Cheers, > >> > Florian > >> > > >> > > >> > _______________________________________________________ > >> > Linux-HA-Dev: [email protected] > >> > <mailto:[email protected]> > >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > >> > Home Page: http://linux-ha.org/ > >> > > >> > > >> > >> -- > >> : Florian Haas > >> : LINBIT | Your Way to High Availability > >> : Tel: +43-1-8178292-60, Fax: +43-1-8178292-82 > >> : > >> : http://www.linbit.com > >> > >> DRBD® and LINBIT® are registered trademarks of LINBIT. > >> > >> This e-mail is solely for use by the intended recipient(s). Information > >> contained in this e-mail and its attachments may be confidential, > >> privileged or copyrighted. If you are not the intended recipient you are > >> hereby formally notified that any use, copying, disclosure or > >> distribution of the contents of this e-mail, in whole or in part, is > >> prohibited. Also please notify immediately the sender by return e-mail > >> and delete this e-mail from your system. Thank you for your co-operation. > >> > >> > >> > > > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
