Re: [caiman-discuss] code-review for AI multihomed

ClayB Wed, 21 Jul 2010 17:16:21 -0700

Hi Dave,
        Thank you for clarifications. Thoughts below. New webrevs:
Full:
http://cr.opensolaris.org/~clayb/webrev.multihomed/dave/2/


Differential:
http://cr.opensolaris.org/~clayb/webrev.multihomed/dave/2/diff

                                        Thank you,
                                        Clay

[snip...]

483-628: The scheme used here is overly complex, as you don't need to
compare addresses to ranges. All that should be needed is to compute
the network address for each SMF exclude_net entry and apply the same
prefix to each interface address to calculate its network address. If
they're equal, then they're the same network; if not, then they're not.


Unfortunately, as both just an IP address or a CIDR notation network may
be provided I think I need to compare ranges. For example, as a
net_address_v4 SMF property can not be empty, to allow all networks, I
pre-populate the networks property with 0.0.0.0/0 and the
exclude_networks property to false.

Reading the comments in the modification, this seems misguided. We don'tneed to provide for a non-CIDR notation at all; users who for some reasondesire to set this to a single address can always use /32.

They can use a /32 but unfortunately the underlying SMF data type alsosupports either an IP or a CIDR notated network address.

Also, the range checking is necessary for the CIDR notated values. (E.g.is network 192.168.0.1/24 in the range of 0.0.0.0/0; though perhaps I'mmissing something?)


[snip...]

Unfortunately, there was more intended than just enumerating interfaces.
Please see my response to Darren as to the list of benefits and
drawbacks to the choice of using a shell-script function here. I think a
C approach would be a potential vector for bugs and would be quite
redundant.
Given that argument, I'm frankly disappointed that the judgment wasn't madeto convert the script-masquerading-as-a-C-program that is do_create_service.Changing 50% of a function and yet leaving it fundamentally mis-implementedisn't really getting this code where it should be.

I will send you a follow-up e-mail in private on the choices here; thoughI fully agree and like your description of"[a] script-masquerading-as-a-C-program"!


[snip...]

Will do, but I'm look to see if anyone has an automated way to format
our XML files as by-hand to what looks good is a rather non-automated
process...
These are infrequently modified, so I'm not sure what benefit you'reanticipating from expending that effort on automation.

I was thinking about general gate hygine, if it could be automated.Talking with the SMF team on IRC it doesn't sound there is a goodautomatic choice. I've hand-formatted the XML back as it was.

setup-dhcp.sh

[snip...]

92-124: This would be a fair bit simpler if the design were for the
macro to be assembled in an array and then converted to a full
definition only in the case of an overwrite or printing it out. Then
you'd be able to just invoke dhtadm without pulling the macro
definition you just assembled back apart.


I agree. Unfortunately, arrays can't be passed into a ksh function in
any way I can find.

I could modify the callers to add_macro() to use an array and then pass
in a string of the collapsed array and another string with array value
offsets and repack it into an array?

However, I think this is something best left to when this code makes it
into a more powerful language like Python where these objects can just
be made first class data types and passed around easily.


See my above comment about when to reimplement.

[snip...]

226: we probably should be delivering /etc/netboot in a package; not
clear to me what file mode it'll end up with and so on.


Yes, delivering it in a package would be good. If it were with
wanboot-cgi, it would mean that when the consumer of /etc/netboot
(wanboot-cgi) is removed from the system so it /etc/netboot and its
structure. However, this belief might need to be changed when ISIM
causes /etc/netboot to be used more widely. Please let me know where you
think this should fall and I can remove the mkdir(1) when the plan is in
place.

Most likely we'll look at this as part of the overall package refactoring inON - we've talked about creating a base package with directories that justshould be delivered once and everyone can then depend on.


Ok, I'll leave the mkdir in place for now.

net-fs-root

42-3: What's the justification for going to these timeout values?


Very good thing that should be documented in the code too! No, these are
wild guesses.

The main reason for them is that if one is doing a svcadm refresh on the
service or otherwise bumping the AI setup when a client is booting, the
client would not previously gracefully recover.

The intention here is to provide a 50 second window where the client
will return to downloading and resume where it left off, if Apache is
upset some how.

We could have TRIES=0 to forever keep retrying but this makes debugging
difficult, as one can not ^C out of the wget to get to console-login, if
wget simply loops forever. The TIMEOUT value is only for a DNS lookup
timeout, connect timeout and stalled read timeout. As such, it should
not impact a working but slow link.

The timeout value seems much too low. We're defaulted to 900 seconds sincewe haven't set it in the current implementation; reducing it by 90% seemsmuch too aggressive based on our fairly limited experience with theperformance aspects here.

Okay; but it seems to me that 900 seconds (15 minutes) is a lot for asingle attempt -- by default wget takes 5 hours (15minutes*20tries) totime fully out. You're right that it is a large reduction though. I thinkthe biggest problem is looking into the tries option:

The default is to retry 20 times, with the exception of fatalerrors like "connection refused" or "not found" (404), which are not retried.

So, if the server is restarting (returning "connection refused" in themeantime) we still bail out. So, understanding your concern and seeing Ireduced the time by 90% and tries by 50%, I should simply add the--continue flag and wrap the wget somewhat like:


TRIES=20
TIMEOUT=900
cmd="$WGET --continue --tries=1"
for file in solaris.zlib solarismisc.zlib; do
        tries=0
        while ! $cmd $file && (( tries < TRIES )); do
                print "Retrying wget of $file in $TIMEOUT seconds"
                sleep $TIMEOUT
                tries=$((tries+1))
        done
done

This way the machine tries 20 times regardless of failure type -- and theuser gets output about the retry too. We end up taking 900sec*20tries(wait time -- should this be a delay we ramp up to?) + 900sec*20tries(wget timeout) = 36,000 seconds worth of time (or 10 hours) per filemaximum time; but the admin will see output of the failures and retrying.

Thank you for raising the point that this initially achieved little ofwhat was intended! However, it seems like a goofy long timeout -- but ifthe machine is not installed, what else would it be doing anyways?

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] code-review for AI multihomed

Reply via email to