Hi Detlef,

Many thanks for the review and apologies for my late reply.

I agree with most of your comments and rather than just go through  
each line item, I'll just provide my rationale for those items I'm  
unlikely to implement. The others of course I'll change.

Log_message(). I decided against log_message simply because I wanted  
the agent to perform distinct tasks and only worry if non-zero return  
codes are found. I've tried to capture each decision path with an  
appropriate scds_syslog() message.

Commandlog rotation. During my testing it's been my experience that a  
match within commandlog is extremely quick and perhaps my 10 seconds  
"look back" into the commandlog is a little too pessimistic.  
Nevertheless it  serves a purpose so I may just as well leave it in.  
However, the real issue is if the commandlog has rotated. In this  
regard,

"...
      By default, the commandlog file is regularly archived at the
      end  of every week. Sun Cluster maintains up to eight previ-
      ously archived commandlog files on each cluster node at  any
      given time.
..."

My feeling is that the probability of rotation and a switch happening  
at exactly the same time is very unlikely. Before coming to this  
conclusion, I'm mindful of the quick matching I've found and that if a  
match is not found a normal failover will occur, i.e. a graceful  
shutdown followed by startup. Please note this only affects domains  
that are required to migrate or live migrate between nodes and that  
are specifically being switched. In summary, it's my opinion that this  
is a very small corner case that has a safety net anyway via a  
graceful shutdown.

I'm hoping to make the other changes soon and push out another webrev  
along with a tar ball of the agent. I've been compiling a cheat sheet  
which I will also push out at the same time.

Thanks again for your time in reviewing this.

Regards
Neil

On 4 Mar 2008, at 13:49, Detlef Ulherr wrote:

> Hi Neil,
>
> here are some comments for your code review.
>
> xvm_register
>
> Line 55
> -y Resource_dependencies=${HAS_RS} \
> you used -p all over, so you should do it here as well just for  
> consistency.
>
>
> General structure
>
> I think it would be advantage to introduce a lib directory wit the
> infrastructure functions like log_message. This would give us  the
> option to introduce a common area in the gate like the oracle and  
> sybase
> already have.
>
> control_xvm
> function validate, you do not check for empty options with a function
> validate_options. from my point of view, you should check for these
> registration errors to avoid unnecessary ping-pongs at registration  
> time.
>
> functions.ksh
>
> validate:
> you do not check for empty options, so forgetting something like  
> DOMAIN
> at registration time will not cause a validation error, but a start
> error which is more complicated to resolve. I would introduce checks  
> for
> empty parameters. I think you can not assume that there won't be such
> errors.
>
> 93         if [ ! -d ${DOMAIN_PATHNAME} ]
>
> This will result in an illegal statement if DOMAIN_PATHNAME is empty.
>
> 108         if [[ "${FAILOVER_TYPE}" != "normal" && "${FAILOVER_TYPE}"
> != "migrate" && "${FAILOVER_TYPE}" != "migrate --live" ]]
>
> If the FAILOVER_TYPE is "migrate  --live" it will result in an  
> error, I
> doubt that the two spaces instead of one are illegal for the virsh
> command. from my feeling you should tolerate more than one space.
>
> 132         /usr/bin/rm ${LOGFILE} > /dev/null
> Change the > to 2> you will get an output only for stderr.
>
> 155         if /usr/bin/virsh dominfo ${DOMAIN} > /dev/null
> What about stderr? You may consider > /dev/null 2>&1, or capture it  
> in a
> log if you want to preserve it.
>
> 159            if [ -f ${DOMAIN_PATHNAME}/${RESOURCE}.xml ]
> if the variables are empty, this statement does not make any sense,  
> you
> should really validate for empty variables
>
>
> 161                 if /usr/bin/virsh define
> ${DOMAIN_PATHNAME}/${RESOURCE}.xml > /dev/null
> What about stderr? You may consider > /dev/null 2>&1, or capture it  
> in a
> log if you want to preserve it.
>
>
> start_domain
> 172                 else
> 173                    # SCMSGS
> 174                    # @explanation
> 175                    # The domain XML define file does not exist
> 176                    # @user_action
> 177                    # Determine if the domain name is correct or  
> that the
> 178                    # domain has been intially defined.
> typo initially should read initially
> 179                    scds_syslog -p daemon.error -t $(syslog_tag) - 
> m \
> 180                         "%s/%s.xml missing for domain %s" \
> 181                         "${DOMAIN_PATHNAME}" "${RESOURCE}" "$ 
> {DOMAIN}"
> 182
> 183                    return 1
> 184                 fi
>
> You execute your virsh define only if $RESOURCE.xml exists so this
> explanation is obviously wrong, You should complain that it is  
> incorrect
> or whatever.
>
> 260            if /usr/bin/virsh dumpxml ${DOMAIN} >
> ${DOMAIN_PATHNAME}/${RESOURCE}.xml
> Every output from stderr will end up on the console, might be  
> confusing.
> Consider 2>/dev/null.
>
>
> check_commandlog function.
> 556                 /usr/bin/grep -w START | /usr/bin/grep switch |
> /usr/bin/grep -w ${RESOURCEGROUP} |\
>
> Your grep -w will match too many resource groups.
> You search for rg-1 and you have two groups installed rg-1 and rg-1-2,
> your grep will match for both
>
> You check your actual commandlog only, if there is a log rotation  
> within
> your 10 seconds you might miss a command.
> Rotated logs are named like:
> commandlog
> commandlog.2
> commandlog.3
>
> So you should check commandlog.2 if it exists and if the current
> commandlog does not contain anything older than your 10 seconds.
>
>
> 607            /usr/cluster/bin/hatimerun -t ${MAX_MIGRATE_TIMEOUT} -k
> KILL \
> 608                 /usr/sbin/xm ${FAILOVER_TYPE} "${DOMAIN}"
> ${PRIVATELINK_TARGET_HOST} > /dev/null
> 683         if /usr/bin/virsh shutdown ${DOMAIN} > /dev/null
> 750         if /usr/bin/virsh destroy ${DOMAIN} > /dev/null
>
>
> You do not deal with stderr, so it will end up on the console,  
> consider
>> /dev/null 2>&1, or capture it in a log if you want to preserve it.
>
>
> Detlef
>
> -- 
>
> *****************************************************************************
> Detlef Ulherr
> Staff Engineer                                Tel: (++49 6103) 752-248
> Availability Engineering                      Fax: (++49 6103) 752-167
> Sun Microsystems GmbH
> Amperestra?e 6                                mailto:detlef.ulherr at sun.com
> 63225 Langen                                  http://www.sun.de/
> *****************************************************************************
>
> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
> Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland  
> Boemer
> Vorsitzender des Aufsichtsrates: Martin Haering
>
> *****************************************************************************
>
>
> _______________________________________________
> ha-clusters-discuss mailing list
> ha-clusters-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss


Reply via email to