This is a tricky problem. I think I have tracked it down, but am still
working to confirm as I don't understand the steps pacemaker goes through to
start a group.
My understanding of how this works from the logs is,
Pacemaker calls "exportfs monitor" as part of an initialization process. It
does this before the preceding RA's in the group are started. The group is
question consists of,
Resource Group: shared
shared_fs (ocf::heartbeat:Filesystem) Started
shared_ip (ocf::heartbeat:IPaddr2) Started
shared_export (ocf::heartbeat:exportfs) Started
Before the exportfs script calls the export_monitor procedure though, it
calls the exportfs_validate procedure which verifies the existence of the
directory to be exported. Since the Filesystem RA has not been started yet,
the directory /data has not been mounted and /data/shared does not exist.
When the directory does not exist, exportfs_validate exits with
$OCF_ERR_ARGS, which equals 2. This causes the error message,
Apr 23 13:49:58 kong102.kongregate.com pengine: [467]: debug: unpack_rsc_op:
shared_export_monitor_0 on kong103.kongregate.com returned 2 (invalid
parameter) instead of the expected value: 7 (not running)
It also causes Pacemaker to give up on the exportfs script as a bad job.
Once the Filesystem RA has been run though, one can run "crm resource
reprobe " on the active node and it will properly export the /data/shared.
The solution to the problem is to move the call to exportfs_validate below
the call to exportfs_monitor. I lean towards just putting it in the
exportfs_start procedure as that is the only procedure that really needs the
directory to exist.
Are there other opinions?
Sid
On Fri, Apr 23, 2010 at 9:32 AM, Sid Stuart <[email protected]> wrote:
> Hi Florian,
>
> I found the problem. The issue is with the RA-exportfs *directory*parameter.
> If "/data" is configured as the directory it works. If
> "/data/shared" is configured, it fails. Since the Linux exportfs command
> will accept "/data/shared" as a valid argument, this is either a bug in the
> documentation or the script.
>
> I will do some more testing and see if I can narrow down where the failure
> occurs in the script.
>
> Thanks for the help,
> Sid
>
>
> On Fri, Apr 23, 2010 at 3:34 AM, Florian Haas <[email protected]>wrote:
>
>> ha-log.txt is empty, messages doesn't contain a single instance of ERROR
>> (on either host), crm_mon.txt doesn't show any failures. Where's the
>> problem?
>>
>> Cheers,
>> Florian
>>
>> On 2010-04-22 18:02, Sid Stuart wrote:
>> > My apologies Florian,
>> >
>> > For this one, I stopped Heartbeat after 8 AM, cleared the log files,
>> > started Heartbeat and then ran
>> >
>> > hb_report -f 8:00 -u root /tmp/exportfs2
>> >
>> > Hopefully, this will have the error.
>> >
>> > Sid
>> >
>> > On Thu, Apr 22, 2010 at 12:51 AM, Florian Haas <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> > The ha-log.txt of that tarball contains a single instance of ERROR:
>> >
>> > Apr 21 13:06:51 kong102.kongregate.com
>> > <http://kong102.kongregate.com> crmd: [11406]: ERROR:
>> > verify_stopped: Resource shared_export was active at shutdown. You
>> may
>> > ignore this error if it is unmanaged.
>> >
>> > That's a follow-up issue where Pacemaker has already decided that
>> there
>> > has been a problem. Please recreate that hb_report where we can
>> actually
>> > see the error happening.
>> >
>> > Unless the file attachment issue has been fixed in the meantime,
>> feel
>> > free to send that hb_report tarball directly to my personal address
>> > again. Thanks!
>> >
>> > Cheers,
>> > Florian
>> >
>> >
>> > _______________________________________________________
>> > Linux-HA-Dev: [email protected]
>> > <mailto:[email protected]>
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> > Home Page: http://linux-ha.org/
>> >
>> >
>>
>> --
>> : Florian Haas
>> : LINBIT | Your Way to High Availability
>> : Tel: +43-1-8178292-60, Fax: +43-1-8178292-82
>> :
>> : http://www.linbit.com
>>
>> DRBD® and LINBIT® are registered trademarks of LINBIT.
>>
>> This e-mail is solely for use by the intended recipient(s). Information
>> contained in this e-mail and its attachments may be confidential,
>> privileged or copyrighted. If you are not the intended recipient you are
>> hereby formally notified that any use, copying, disclosure or
>> distribution of the contents of this e-mail, in whole or in part, is
>> prohibited. Also please notify immediately the sender by return e-mail
>> and delete this e-mail from your system. Thank you for your co-operation.
>>
>>
>>
>
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/