[snipped down to things not already answered via other email] ... >>>>> 2) It has spotty support for upgrading to metadevices. >>> I am pretty sure that there is a bug on this one, but I am having >>> troubles finding it. Essentially, it boils down to the following >>> blowing up: >>> >>> lucreate -s - -n newbe -m /:d30:ufs,preserve >>> luupgrade -f -n newbe -s $osmedia -J 'archive_location nfs://somewhere' >>> >>> To work around this, I have done: >>> >>> # cp $osmedia/Solaris_10/Tools/Boot/usr/sbin/install.d/pfinstall \ >>> /var/tmp/pfinstall.orig >>> # mount -F lofs -O /dir/pfinstall-wrapper \ >>> $osmedia/Solaris_10/Tools/Boot/usr/sbin/install.d/pfinstall >>> >>> The wrapper causes the following change in the profile before calling >>> /var/tmp/pfinstall.orig >>> >>> < filesys d30 existing / >>> --- >>>> filesys mirror:d30 c0t0d0s3 c0t1d0s3 existing / >>>> metadb c0t0d0s7 >>>> metadb c0t1d0s7 >>> Note that this has worked for me on one machine and just got it to >>> work in the past 24 hours. By no means am I convinced that it is a >>> robust workaround yet. >>> >> Kind of a scary workaround, but it'll be good input to the bug report. > > Very much so. I wish that the source code for live upgrade was available... >
We're working on the plan for getting the rest of the install code out. It's about 10 times the size of the packaging code, so it's not a trivial task, and there are a couple of legal issues to be researched. Probably be at least 4 months, and that's perhaps optimistic ;-) >>> Beyond that, I found the following problems going from S9 to S10: >>> >>> 1) netgroup entries in /etc/shadow were missing but they were in >>> /etc/passwd. >>> 2) Solaris 10 should have more default password entries than Solaris 9 >>> (gdm, webservd, etc.). These were lost. >>> 3) swap and other metadevices were commented from vfstab >>> 4) mount points for lofs file systems were missing >>> 5) Complaints about svc:/system/cvc:default in maintenance mode when >>> it was not appropriate for the platform (should not have been enabled) >>> 6) SVM related sevices are not enabled >>> 7) JASS to the new boot environment looked kinda scary when it started >>> out with complaining about shared library problems calling zonename. >>> >> Was this going to S10 1/06? That last one looks like something that >> should occur only in that case. > > It was S10 1/06. Running JASS after reboot was just fine, though. It > just speaks to the point that live upgrade could really stand to run > in its own virtual machine. > I wouldn't expect it to have been a problem if the LU required patches were applied, though. >> Seems like a number of class-action scripts didn't work right, though. >> Feels like something really basic went wrong in the installation. > > By class action, I assume you mean post-install scripts, right? My > understanding was that these should run only after a pkgadd, not when > a flash archive is applied. Or are there other scripts that I am not > aware of? > Class-action scripts (CAS's is how you'll often see them referenced) are different from postinstall scripts. Every file we install has a class it's associated with; most are in "none" which means they just get copied from the package and have their ownership and mode adjusted appropriately. But when there's special handling required, they are placed in some other named class and a script is written to do the special work. Config files usually need to be manipulated in this way so that customizations are preserved across upgrades. > The netgroup thingy seems to be related to a poorly documented feature > that got new behavior with somewhat recent updates to PAM. > Previously, netgroup entries were not required in /etc/shadow. > Frankly, it wouldn't surprise me to see this one fall through the > cracks. > > The fact that passwd was missing some entries seems like an > over-zealous sync task. > > Mount points and vfstab problems surprised me. I haven't seen this > problem before. > > Because the flar was generated on a 15k, having system/cvc enabled was > not terribly surprising. > > SVM related services not being enabled is a problem for regular > jumpstarts as well. > One thing I'm realizing here is that you seem to have installed the new boot environment using a flash archive, right? If so, the class-action scripts don't get run during that install, as they would have been run on the original installation of the master system. More likely, some of this may be the result of the synchronization between boot environments that LU does for you after you luactivate and reboot (synclist(4) talks about how this is done). You might see what's in /etc/lu/sync.log. >>>> I expect we'll fix the fragmentation between x86 and SPARC by going to >>>> GRUB on SPARC as well. Long term, I think the model is better for most >>>> people, but the transition could have been handled better, I agree. >>> This oughta be interesting... Is this part of making zfs bootable >>> (that is, is it easier to write the bootstrap code for grub than it is >>> for openboot?) >>> >> Yes, it's very much related to the zfs boot support. As anyone who >> tried to use WAN installation found, getting new OBP features released >> on all the platforms is very difficult. > > I got excited about wanboot when I first read about it in ~2002 and > was disappointed to see no OpenBoot updates to support it. The first > platform I have seen ship with network-boot-params as a variable in > nvram is a T2000. In that time I completely gave up on the > technology. There are some cases where I may start looking at it > again. > It should work on all the newer, smaller systems, but I certainly understand why people may have given up considering the delay in getting the OBP updates out there. It still doesn't work on 15K's and 6900's and their ilk. >>>>> Optimization of network performance is sometimes a matter of >>>>> optimizing the size of installation media. If something like flash >>>>> archives continues to exist, they should use a better compression tool >>>>> than compress(1). >>>> Sure, providing options here seems reasonable. >>> A key here may be to devise a file format that chunks a data stream >>> into lots of somewhat large pieces that are individually compressed, >>> using the compression algorithm that gives the right mix of speed and >>> size. When the data stream is being extracted, the various chunks >>> could be individually uncompressed on multiple hardware threads. >>> >> Seems like it may be overkill based on likely source and destination >> bandwidth, but an interesting idea. > > Within a year I bet my NFS file servers are on 10 gigabit. "Jumpstart > clients" are increasingly getting faster internal disks, using SAN > boot, or possibly using iSCSI to a decent array. At the same time, > single-threaded CPU performance has seemed to hit a brick wall in > favor of multi-threaded designs. > No disagreement there. > A simple test of "time gzcat /tmp/stuff.tar.gz > /dev/null" indicates > that gzcat can process about 27 MB/s on a Blade 1500 running at 1062 > MHz. Rather interestingly, zcat can only do about 19 MB/s. The file > stuff.tar contains about 21 MB of stuff from /sbin and /etc on a S9 > box. These data rates will keep pretty much any internal disk today > saturated, especially with the number of small files typically found > in an OS. However, if cache-based arrays are used for the OS disk or > the flash archive contains large files, the single-core performance > can start to get in the way. Obviously, more analysis is needed > before deciding this is the future of decompression. > > If you look at the problem from the other direction, however, > compression can benefit greatly from parallel algorithms because even > fastest cores are slower at gzip or bzip2 than moderately fast disks. > Thanks for the ideas. I'll keep it on my list of things to consider for the design stage, we'd certainly do some analysis of several approaches to try and optimize performance. Dave
