Hi Detlef,
Many thanks for the review and apologies for my late reply.
I agree with most of your comments and rather than just go through
each line item, I'll just provide my rationale for those items I'm
unlikely to implement. The others of course I'll change.
Log_message(). I decided against log_message simply because I wanted
the agent to perform distinct tasks and only worry if non-zero return
codes are found. I've tried to capture each decision path with an
appropriate scds_syslog() message.
Commandlog rotation. During my testing it's been my experience that a
match within commandlog is extremely quick and perhaps my 10 seconds
"look back" into the commandlog is a little too pessimistic.
Nevertheless it serves a purpose so I may just as well leave it in.
However, the real issue is if the commandlog has rotated. In this
regard,
"...
By default, the commandlog file is regularly archived at the
end of every week. Sun Cluster maintains up to eight previ-
ously archived commandlog files on each cluster node at any
given time.
..."
My feeling is that the probability of rotation and a switch happening
at exactly the same time is very unlikely. Before coming to this
conclusion, I'm mindful of the quick matching I've found and that if a
match is not found a normal failover will occur, i.e. a graceful
shutdown followed by startup. Please note this only affects domains
that are required to migrate or live migrate between nodes and that
are specifically being switched. In summary, it's my opinion that this
is a very small corner case that has a safety net anyway via a
graceful shutdown.
I'm hoping to make the other changes soon and push out another webrev
along with a tar ball of the agent. I've been compiling a cheat sheet
which I will also push out at the same time.
Thanks again for your time in reviewing this.
Regards
Neil
On 4 Mar 2008, at 13:49, Detlef Ulherr wrote:
> Hi Neil,
>
> here are some comments for your code review.
>
> xvm_register
>
> Line 55
> -y Resource_dependencies=${HAS_RS} \
> you used -p all over, so you should do it here as well just for
> consistency.
>
>
> General structure
>
> I think it would be advantage to introduce a lib directory wit the
> infrastructure functions like log_message. This would give us the
> option to introduce a common area in the gate like the oracle and
> sybase
> already have.
>
> control_xvm
> function validate, you do not check for empty options with a function
> validate_options. from my point of view, you should check for these
> registration errors to avoid unnecessary ping-pongs at registration
> time.
>
> functions.ksh
>
> validate:
> you do not check for empty options, so forgetting something like
> DOMAIN
> at registration time will not cause a validation error, but a start
> error which is more complicated to resolve. I would introduce checks
> for
> empty parameters. I think you can not assume that there won't be such
> errors.
>
> 93 if [ ! -d ${DOMAIN_PATHNAME} ]
>
> This will result in an illegal statement if DOMAIN_PATHNAME is empty.
>
> 108 if [[ "${FAILOVER_TYPE}" != "normal" && "${FAILOVER_TYPE}"
> != "migrate" && "${FAILOVER_TYPE}" != "migrate --live" ]]
>
> If the FAILOVER_TYPE is "migrate --live" it will result in an
> error, I
> doubt that the two spaces instead of one are illegal for the virsh
> command. from my feeling you should tolerate more than one space.
>
> 132 /usr/bin/rm ${LOGFILE} > /dev/null
> Change the > to 2> you will get an output only for stderr.
>
> 155 if /usr/bin/virsh dominfo ${DOMAIN} > /dev/null
> What about stderr? You may consider > /dev/null 2>&1, or capture it
> in a
> log if you want to preserve it.
>
> 159 if [ -f ${DOMAIN_PATHNAME}/${RESOURCE}.xml ]
> if the variables are empty, this statement does not make any sense,
> you
> should really validate for empty variables
>
>
> 161 if /usr/bin/virsh define
> ${DOMAIN_PATHNAME}/${RESOURCE}.xml > /dev/null
> What about stderr? You may consider > /dev/null 2>&1, or capture it
> in a
> log if you want to preserve it.
>
>
> start_domain
> 172 else
> 173 # SCMSGS
> 174 # @explanation
> 175 # The domain XML define file does not exist
> 176 # @user_action
> 177 # Determine if the domain name is correct or
> that the
> 178 # domain has been intially defined.
> typo initially should read initially
> 179 scds_syslog -p daemon.error -t $(syslog_tag) -
> m \
> 180 "%s/%s.xml missing for domain %s" \
> 181 "${DOMAIN_PATHNAME}" "${RESOURCE}" "$
> {DOMAIN}"
> 182
> 183 return 1
> 184 fi
>
> You execute your virsh define only if $RESOURCE.xml exists so this
> explanation is obviously wrong, You should complain that it is
> incorrect
> or whatever.
>
> 260 if /usr/bin/virsh dumpxml ${DOMAIN} >
> ${DOMAIN_PATHNAME}/${RESOURCE}.xml
> Every output from stderr will end up on the console, might be
> confusing.
> Consider 2>/dev/null.
>
>
> check_commandlog function.
> 556 /usr/bin/grep -w START | /usr/bin/grep switch |
> /usr/bin/grep -w ${RESOURCEGROUP} |\
>
> Your grep -w will match too many resource groups.
> You search for rg-1 and you have two groups installed rg-1 and rg-1-2,
> your grep will match for both
>
> You check your actual commandlog only, if there is a log rotation
> within
> your 10 seconds you might miss a command.
> Rotated logs are named like:
> commandlog
> commandlog.2
> commandlog.3
>
> So you should check commandlog.2 if it exists and if the current
> commandlog does not contain anything older than your 10 seconds.
>
>
> 607 /usr/cluster/bin/hatimerun -t ${MAX_MIGRATE_TIMEOUT} -k
> KILL \
> 608 /usr/sbin/xm ${FAILOVER_TYPE} "${DOMAIN}"
> ${PRIVATELINK_TARGET_HOST} > /dev/null
> 683 if /usr/bin/virsh shutdown ${DOMAIN} > /dev/null
> 750 if /usr/bin/virsh destroy ${DOMAIN} > /dev/null
>
>
> You do not deal with stderr, so it will end up on the console,
> consider
>> /dev/null 2>&1, or capture it in a log if you want to preserve it.
>
>
> Detlef
>
> --
>
> *****************************************************************************
> Detlef Ulherr
> Staff Engineer Tel: (++49 6103) 752-248
> Availability Engineering Fax: (++49 6103) 752-167
> Sun Microsystems GmbH
> Amperestra?e 6 mailto:detlef.ulherr at sun.com
> 63225 Langen http://www.sun.de/
> *****************************************************************************
>
> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
> Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland
> Boemer
> Vorsitzender des Aufsichtsrates: Martin Haering
>
> *****************************************************************************
>
>
> _______________________________________________
> ha-clusters-discuss mailing list
> ha-clusters-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss