Hi Dave,
        Thank you for clarifications. Thoughts below. New webrevs:
Full:
http://cr.opensolaris.org/~clayb/webrev.multihomed/dave/2/

Differential:
http://cr.opensolaris.org/~clayb/webrev.multihomed/dave/2/diff

                                        Thank you,
                                        Clay

[snip...]
483-628: The scheme used here is overly complex, as you don't need to
compare addresses to ranges. All that should be needed is to compute
the network address for each SMF exclude_net entry and apply the same
prefix to each interface address to calculate its network address. If
they're equal, then they're the same network; if not, then they're not.

Unfortunately, as both just an IP address or a CIDR notation network may
be provided I think I need to compare ranges. For example, as a
net_address_v4 SMF property can not be empty, to allow all networks, I
pre-populate the networks property with 0.0.0.0/0 and the
exclude_networks property to false.


Reading the comments in the modification, this seems misguided. We don't need to provide for a non-CIDR notation at all; users who for some reason desire to set this to a single address can always use /32.

They can use a /32 but unfortunately the underlying SMF data type also supports either an IP or a CIDR notated network address.

Also, the range checking is necessary for the CIDR notated values. (E.g. is network 192.168.0.1/24 in the range of 0.0.0.0/0; though perhaps I'm missing something?)

[snip...]
Unfortunately, there was more intended than just enumerating interfaces.
Please see my response to Darren as to the list of benefits and
drawbacks to the choice of using a shell-script function here. I think a
C approach would be a potential vector for bugs and would be quite
redundant.


Given that argument, I'm frankly disappointed that the judgment wasn't made to convert the script-masquerading-as-a-C-program that is do_create_service. Changing 50% of a function and yet leaving it fundamentally mis-implemented isn't really getting this code where it should be.

I will send you a follow-up e-mail in private on the choices here; though I fully agree and like your description of "[a] script-masquerading-as-a-C-program"!

[snip...]
Will do, but I'm look to see if anyone has an automated way to format
our XML files as by-hand to what looks good is a rather non-automated
process...


These are infrequently modified, so I'm not sure what benefit you're anticipating from expending that effort on automation.

I was thinking about general gate hygine, if it could be automated. Talking with the SMF team on IRC it doesn't sound there is a good automatic choice. I've hand-formatted the XML back as it was.

setup-dhcp.sh
[snip...]
92-124: This would be a fair bit simpler if the design were for the
macro to be assembled in an array and then converted to a full
definition only in the case of an overwrite or printing it out. Then
you'd be able to just invoke dhtadm without pulling the macro
definition you just assembled back apart.

I agree. Unfortunately, arrays can't be passed into a ksh function in
any way I can find.

I could modify the callers to add_macro() to use an array and then pass
in a string of the collapsed array and another string with array value
offsets and repack it into an array?

However, I think this is something best left to when this code makes it
into a more powerful language like Python where these objects can just
be made first class data types and passed around easily.


See my above comment about when to reimplement.

[snip...]
226: we probably should be delivering /etc/netboot in a package; not
clear to me what file mode it'll end up with and so on.

Yes, delivering it in a package would be good. If it were with
wanboot-cgi, it would mean that when the consumer of /etc/netboot
(wanboot-cgi) is removed from the system so it /etc/netboot and its
structure. However, this belief might need to be changed when ISIM
causes /etc/netboot to be used more widely. Please let me know where you
think this should fall and I can remove the mkdir(1) when the plan is in
place.


Most likely we'll look at this as part of the overall package refactoring in ON - we've talked about creating a base package with directories that just should be delivered once and everyone can then depend on.

Ok, I'll leave the mkdir in place for now.

net-fs-root

42-3: What's the justification for going to these timeout values?

Very good thing that should be documented in the code too! No, these are
wild guesses.

The main reason for them is that if one is doing a svcadm refresh on the
service or otherwise bumping the AI setup when a client is booting, the
client would not previously gracefully recover.

The intention here is to provide a 50 second window where the client
will return to downloading and resume where it left off, if Apache is
upset some how.

We could have TRIES=0 to forever keep retrying but this makes debugging
difficult, as one can not ^C out of the wget to get to console-login, if
wget simply loops forever. The TIMEOUT value is only for a DNS lookup
timeout, connect timeout and stalled read timeout. As such, it should
not impact a working but slow link.


The timeout value seems much too low. We're defaulted to 900 seconds since we haven't set it in the current implementation; reducing it by 90% seems much too aggressive based on our fairly limited experience with the performance aspects here.

Okay; but it seems to me that 900 seconds (15 minutes) is a lot for a single attempt -- by default wget takes 5 hours (15minutes*20tries) to time fully out. You're right that it is a large reduction though. I think the biggest problem is looking into the tries option:

The default is to retry 20 times, with the exception of fatal errors like "connection refused" or "not found" (404), which are not retried.

So, if the server is restarting (returning "connection refused" in the meantime) we still bail out. So, understanding your concern and seeing I reduced the time by 90% and tries by 50%, I should simply add the --continue flag and wrap the wget somewhat like:

TRIES=20
TIMEOUT=900
cmd="$WGET --continue --tries=1"
for file in solaris.zlib solarismisc.zlib; do
        tries=0
        while ! $cmd $file && (( tries < TRIES )); do
                print "Retrying wget of $file in $TIMEOUT seconds"
                sleep $TIMEOUT
                tries=$((tries+1))
        done
done

This way the machine tries 20 times regardless of failure type -- and the user gets output about the retry too. We end up taking 900sec*20tries (wait time -- should this be a delay we ramp up to?) + 900sec*20tries (wget timeout) = 36,000 seconds worth of time (or 10 hours) per file maximum time; but the admin will see output of the failures and retrying.

Thank you for raising the point that this initially achieved little of what was intended! However, it seems like a goofy long timeout -- but if the machine is not installed, what else would it be doing anyways?
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to