Hi Thorsten,
Many thanks for the detailed review. I really appreciate the time you've
taken to go through this.
After a few reviews now I will produce a code refresh of this agent and
will ensure that all the packaging requirements are in place so that it
will build. I'll email again when this is done.
If I may I have included some comments below.
Regards
Neil
Thorsten Frueauf wrote:
>Hi Neil,
>
>I still could not find the necessary changes within usr/src/pkgdefs/* in
>your webrev - so it is not complete and I doubt it will properly build
>as-is.
>Which means my review is not yet complete.
>
>Apart from that, here is my review to the existing files:
>
>general: most scripts do not have the .ksh extension, although your
>Makefile does expect them!
>
>usr/src/cmd/ha-services/gds-agents/ids/control_ids
> - your file is named control_ids, but the Makefile really expects
> control_ids.ksh - need to rename!
>
>
>
Agreed.
> - line 108, since you later "return 1", doesn't this need to be
> of servity daemon.error instead of daemon.notice?
> The SCMSGS is missing because this is used with other agents
>as well?
>
>
Agreed.
> - line 133, although you setup $OUTPUT (which logs into
>${LOGFILE}),
> where is that output ever been printed out? In a former gds
>template
> we had log_message(), which you don't have. What is logging
> $LOGFILE to syslog in the end?
>
>
>
My initial thought was to do away with log_message() and simply rely on
the return code from whatever command was being issued. However, if I
follow that thought through then within set_redirection() I could just
have /dev/null instead of ${LOGFILE}, i.e. discard any output from
whatever command is being run and just rely on the return code.
If I recall correctly we coded log_message() to capture all output from
a command. The benefit from doing so was that we could output, via
log_message(), to syslog. The further benefit of this is that we would
also capture any undocumented error messages.
With this in mind I'll reinstate log_message(). Thanks for questioning this.
>usr/src/cmd/ha-services/gds-agents/ids/functions:
> - your file is named functions, but the Makefile really expects
> functions.ksh - need to rename!
>
>
>
Agreed.
> - line 95 and 109, your are hard coding "informix" as user and
> group that must exist. Is that really a Informix requirement?
> Wouldn't it make sense to make it configurable if this changes
> for future versions? If not, can you please add an comment that
> Informix really requires this, maybe with a pointer to the docs?
>
>
>
Yes, group and userid informix is a requirement I'll add a pointer to
the docs and will also consider making it configurable.
> - line 103 and 117, the @user_action is not helpfull. Does this
>mean the
> user must create this user manually? Or is this something
>performed
> by the Informix installation procedure?
>
>
Agreed, I'll update the @user_action.
> - line 127, if the user does not exists, wouldn't that create an
> error message to stderr (ie. show on the console)?
>
>
I don't believe it will, but I will check.
> - line 129, your if clause looks strange - what are you
>checking for?
> Maybe there is a "-n" missing?
>
>
Looking at the man page for test I can see the following,
-n string True if the length of string is non-zero.
string True if the string string is not the null string.
However, I'll amend this and other similar entries to include "-n".
> - line 181, you should quote ${INFORMIXDIR}
>
>
Agreed.
> - line 192, missing \ at the end of the line!
>
>
Thanks.
> - line 193, you should quote ${INFORMIXDIR}
>
>
Agreed.
> - line 197, maybe a comment would be helpful as of how the
> ONCONFIG file looks like as an example - that would help to
> understand what you validate for in the following.
>
>
>
Agreed.
> - line 201 and 216 and 246, note that your grep would also match
> names where the one you rep for is included, but does have a
> different pre- or postfix - it is not checking for an exact
>match.
> Is that intended?
>
>
No, I'll include "-w".
> - line 281, doesn't the rm create a failure message to stderr if
> ${LOGFILE} does not exist? Maybe redirect the output to
>/dev/null?
>
>
Okay.
> - line 294 - when would check_ids actually return 1?
> Looking at the function I can not see where.
>
>
You're right. I'm going to rework check_ids() to remove tolerating
"maintenance" and "startup" modes. I now feel this is maybe confusing by
not offering a clear separation between how a cluster should manage an
IDS service and how a DBA should manage an IDS service, i.e. if any DBA
work is required then that should be coordinated with those managing IDS
within a cluster. I will however still tolerate some blocked states.
> - line 32ff., I am a bit surprised by the user_action, especially
> "The Solaris Cluster resource will attempt to restart the
>Informix Server."
> That is not always correct, is it? If "oninit -y" always
>fails, then
> this will result into failed or stop_failed?
>
>
Agreed, I'll amend the @user_action.
> - line 349, the servity for syslog should be "daemon.error"
>
>
>
Agreed.
> - line 408/410 - can you explain with a comment why you do the
>"grep ."?
>
>
Okay.
> - general on check_ids() - you rely a lot on the output of onstat.
> Is that a stable interface to parse? If not, any way to get the
> various status via a stable interface?
>
>
>
Yes, it's my belief that onstat is a stable interface.
> - general on check_ids() - you nearly never quote the variables you
> provide to scds_syslog - you should quote them!
>
>
Agreed.
> - line 469, missing " before HANG_SYSTEM"
>
>
Agreed.
> - line 485, in the comment you say with the "HANG_SYSTEM" or
> "MEDIA FAILURE" the DBA intervention is necessary - but you keep
> restarting the resource?
>
>
I'd prefer to just disable the resource in this sceanrio. However with
returning 0-100 or 201 from a GDS probe I couldn't determine a better
alternative to 100 as I'd like to keep Failover_enabled=true.
> - line 490-502, I find the comment confusing - you say
>something about
> return code 1 - but in the following code it well rc will never
> be set to 1? In fact you either set 100 or 0. Is that intended?
> If so, please correct the comment.
>
>
Agreed. As outlined above I'll rework check_ids() to not tolerate
"maintenance" and "startup" nodes as I now think it will be confusing.
> - line 508-521, same thing as with 490-502 - I do not fully
>understand
> the comment return code 1.
>
>
Agreed.
> - line 567 and 568, the check is not specifying an operation,
> don't you need -n in the [ ] test?
>
>
>
Agreed.
> - line 634ff - again the confusion about return 1 - I have not yet
> understood when check_ids() does return 1 in the code.
>
> - line 669-674 distincts between zone and non-zone systems,
> while later (line 677ff) you always call the -mcopbZ to ipcs.
> Is that intended? Doesn't this limit the use of the agent to
> Solaris 10 and newer? The whole block line 677-704 is S10 only.
>
>
>
Yes, the objective is to only deploy this agent on Solaris 10 or newer.
I'll amend to code to reflect this.
>- usr/src/cmd/ha-services/gds-agents/ids/ids_register
> - your file is named ids_register, but the Makefile really expects
> ids_register.ksh - need to rename!
>
>
>
Sigh!
> - line 29, you should not put temp files into /opt. I suggest
> you use /tmp or /var/tmp for that. but I guess you need it there
> for the register within the non-global zone / HA Container case.
> In that case, why not transfer it (again to /tmp or /var/tmp) of
> the non-global zone via the method shown at line 224 of
>
>http://src.opensolaris.org/source/xref/ohac/ohacds/usr/src/cmd/ha-services/gds-agents/mys/ha_mysql_register.ksh
>
>
Agreed.
> - line 81, 87, 93, 102, 137, 142 and 151 does always exit 1 without
> cleaning up the temp file - which would disturb a clean pkgrm.
> This is also an argument to place it on /tmp or /var/tmp.
>
>
I'll amend this.
> - line 104, you need to grep for "^Zonename=", otherwise you might
> match a comment or old commented out entry.
>
>
Thanks.
> - line 110-118 - do you really want the leading tab there?
>
>
I think it's easier to read this way but I'm also happy to amend it.
> - line 129, you just check if the state of the resource is
>"OFFLINE".
> But the "clrs delete ${RS}" does only succeed if the resource
> is also disabled - so you also need to check for -O ON_OFF_SWITCH
> via scha_resource_get if the resource is DISABLED.
>
>
Agreed.
>
>- usr/src/cmd/ha-services/gds-agents/ids/ids_smf_register
> - your file is named ids_smf_register, but the Makefile really
>expects
> ids_smf_register.ksh - need to rename!
>
>
>
Hmm.
> - line 125, you should check if $1 is really existing.
>
>
>
Agreed.
>- usr/src/cmd/ha-services/gds-agents/ids/ids_smf_remove
> - your file is named ids_smf_remove, but the Makefile really
>expects
> ids_smf_remove.ksh - need to rename!
>
>
Thanks.
> - line 44 and 50, you should quote the variables
>
>
Agreed.
> - line 56, shouldn't the check be
> [ "${rc}" -ne 0 ] && exit ${rc}
>
>
>
True.
>All for now.
>
>
>
Many thanks again Thorsten.
>Greets
> Thorsten
>
>Thorsten Frueauf wrote:
>
>
>>Hi Neil et al,
>>
>>one early comment about the webrev, while browsing through it I think
>>you are missing the package source part within usr/src/pkgdefs/* for the
>>new ids package.
>>
>>Greets
>> Thorsten
>>
>>Neil Garthwaite wrote:
>>
>>
>>>Hi Everyone,
>>>
>>>The HA-Informix project page
>>>http://opensolaris.org/os/project/ha-informix has been updated to
>>>include the initial code for the HA-Informix agent.
>>>
>>>I'd welcome a code review for this agent, a webrev can be found here
>>>
>>> http://cr.opensolaris.org/~neilg/ohacds-informix
>>>
>>>Alternatively, the agent is available for download. Please use the
>>>accompanying cheatsheet if you want to try out the agent.
>>>
>>> http://opensolaris.org/os/project/ha-informix/cheatsheet_ohac_ids.pdf
>>>
>>>Finally, many thanks to everyone that reviewed the Design Doc and Caller
>>>Sequence Doc, I've updated those docs to reflect your comments.
>>>
>>>Regards
>>>Neil
>>>
>>>
>
>
>