Mike Gerdts wrote:
> On 3/24/06, Dave Miner <Dave.Miner at sun.com> wrote:
>> Mike Gerdts wrote:
>> ...
>>> I'm *so* glad to see that this is an area of focus.  My comments on
>>> the document and installation related tasks follow.
>>>
>>> Page 6, Bullet 2, sub-item 2: SUNWCXall no longer does the trick...
>>> when SUNWCXall is installed on a sun4u box (15k domain) the sun4v
>>> platform support is not added.  This implies that in addition to my
>>> 15k domain used primarily for image development (that ain't cheap), I
>>> now need to have a T1000 or T2000 sitting around for the same purpose.
>>>  In a globally distributed jumpstart environment, I now need to
>>> distribute three ~2 GB flash archives to get x86-64, sun4u, and sun4v
>>> support.
>>>
>> Thanks for pointing this out, as I hadn't noticed it.
> 
> Is this a bug or accepted limitation for some reason?  Has pointing it
> out caused it to be noted in an updated version of the document, a bug
> filed, or both?  I can file the bug through OpenSolaris if this is not
> a conscious design decision.
> 

Sorry, I should have said a bit more.  It's a conscious design decision, 
at least in terms of the way installation was designed oh so many years 
ago, when sun4, sun4c, sun4d, and sun4u all walked the earth, disks were 
small, and so on.  sun4v's the first new architecture in SPARC systems 
in 10 years.  I need to look into the issues a bit, and it'll probably 
lead to some more verbiage in the document when I update it in a couple 
of weeks.

>>> Page 8 - Live Upgrade is also hampered by the following in my environment:
>>>
>>> 1) It uses a version of cpio which does not support sparse files.
>>> This causes files like /var/adm/lastlog to balloon in size when large
>>> UID's (100,000,000 - 999,999,999) are used.  Similar issues likely
>>> exist if a quotas file happens to be in a partition used for live
>>> upgrade.
> 
> Bug 4480319.  I'm not sure if this is the one that I filed or not, but
> it's been out there for a while.  I've discussed this a bit on
> zones-discuss as well because "zoneadm clone" now has the same problem
> as live upgrade and flash archives.
> 

Yeah, you're listed as one of the customers for it.  Now that we have 
SEEK_HOLE in Nevada, we could probably fix it without too much pain.

>>> 2) It has spotty support for upgrading to metadevices.
> 
> I am pretty sure that there is a bug on this one, but I am having
> troubles finding it.   Essentially, it boils down to the following
> blowing up:
> 
> lucreate -s - -n newbe -m /:d30:ufs,preserve
> luupgrade -f -n newbe -s $osmedia -J 'archive_location nfs://somewhere'
> 
> To work around this, I have done:
> 
> # cp $osmedia/Solaris_10/Tools/Boot/usr/sbin/install.d/pfinstall \
>     /var/tmp/pfinstall.orig
> # mount -F lofs -O /dir/pfinstall-wrapper \
>     $osmedia/Solaris_10/Tools/Boot/usr/sbin/install.d/pfinstall
> 
> The wrapper causes the following change in the profile before calling
> /var/tmp/pfinstall.orig
> 
> < filesys d30 existing /
> ---
>> filesys mirror:d30 c0t0d0s3 c0t1d0s3 existing /
>> metadb c0t0d0s7
>> metadb c0t1d0s7
> 
> Note that this has worked for me on one machine and just got it to
> work in the past 24 hours.  By no means am I convinced that it is a
> robust workaround yet.
> 

Kind of a scary workaround, but it'll be good input to the bug report.

> Beyond that, I found the following problems going from S9 to S10:
> 
> 1) netgroup entries in /etc/shadow were missing but they were in /etc/passwd.
> 2) Solaris 10 should have more default password entries than Solaris 9
> (gdm, webservd, etc.).  These were lost.
> 3) swap and other metadevices were commented from vfstab
> 4) mount points for lofs file systems were missing
> 5) Complaints about svc:/system/cvc:default in maintenance mode when
> it was not appropriate for the platform (should not have been enabled)
> 6) SVM related sevices are not enabled
> 7) JASS to the new boot environment looked kinda scary when it started
> out with complaining about shared library problems calling zonename.
> 

Was this going to S10 1/06?  That last one looks like something that 
should occur only in that case.

Seems like a number of class-action scripts didn't work right, though. 
Feels like something really basic went wrong in the installation.

>>> 3) It sometimes requires applying patches and new packages to a running 
>>> system
>> I hope you've filed bug reports on the first two.  Using ZFS will make
>> both less of an issue, since we don't need to copy or use metadevices.
> 
> #2 is an ongoing issue that a co-worker has a case open on right now.
> 
>> #3 is kind of a hard problem; I've got some ideas mentioned in the paper
>> about perhaps using VM's to provide the environment in which the upgrade
>> would run, which would limit the need for patches specifically for the
>> upgrade.  I need to kick those around with the experts a bit to see if
>> they're actually feasible.
> 
> I really like this idea.  I suspect that it will be hard to achieve
> (lots of dom0 support) if Xen cannot be nested arbitrarily deep.
> 

Yes, I expect it's well outside the design center, so it'll be an 
interesting discussion.

>> I expect we'll fix the fragmentation between x86 and SPARC by going to
>> GRUB on SPARC as well.  Long term, I think the model is better for most
>> people, but the transition could have been handled better, I agree.
> 
> This oughta be interesting... Is this part of making zfs bootable
> (that is, is it easier to write the bootstrap code for grub than it is
> for openboot?)
> 

Yes, it's very much related to the zfs boot support.  As anyone who 
tried to use WAN installation found, getting new OBP features released 
on all the platforms is very difficult.

>>> It is really hard to find a reference to anyone other than Sun using
>>> Sun's DHCP server for jumpstart.  Perhaps Sun should consider
>>> migrating to ISC dhcpd where there is much more mindshare (and support
>>> for vendor options that exceed 255 bytes).
>>>
>> There are some postings on comp.unix.solaris on how to use the ISC
>> server if you wish, and I also have a script that one customer was kind
>> enough to provide which does a similar setup for the Microsoft DHCP server.
> 
> The point here is that I need the DHCP server to be supported.  I wish
> that I could support my Solaris jumpstart environment using a sun
> supported ISC dhcp server.  My next best option is ISC on Red Hat.
> 

Can't offer you anything in the short term, though the N1 System Manager 
product does so for the platforms it supports.
...
>>> Section 2.2.6 - Installing from flash archives whacks all sysidcfg
>>> information.  In a disaster recovery scenario, you likely don't want
>>> that to happen.  Integration with flash archives and a custom
>>> netbackup agent would be nice...
>>>
>> Agreed about the recovery requirement; can you elaborate on what you're
>> looking for in the integration with a backup system?
> 
> Many backup programs have the ability to do one or more of the following:
> 
> 1) Call a custom module that will generate a data stream to be backed up.
> 2) Call a custom script as a "pre-backup" script.
> 
> It may be (or may not be) useful to tie those mechanisms in with flash
> tools to allow the backup system to manage the retention of old flash
> archives (full or differential).  Then in a restore situation, the
> "special flash" tools on a live CD or network boot would be able to
> load the appropriate data from tape, using the backup system as an
> intermediary.
> 
> Simply doing flash archives to disk, then taking those to tape as
> required may be more practical.
> 

Thanks for clarifying.  I'll think about this some.

>>> Optimization of network performance is sometimes a matter of
>>> optimizing the size of installation media.  If something like flash
>>> archives continues to exist, they should use a better compression tool
>>> than compress(1).
>> Sure, providing options here seems reasonable.
> 
> A key here may be to devise a file format that chunks a data stream
> into lots of somewhat large pieces that are individually compressed,
> using the compression algorithm that gives the right mix of speed and
> size.  When the data stream is being extracted, the various chunks
> could be individually uncompressed on multiple hardware threads.
> 

Seems like it may be overkill based on likely source and destination 
bandwidth, but an interesting idea.

> With ZFS promoting compressed file systems, perhaps compression is as
> interesting of a use for a customized core as an FPU or encryption
> accelerator.
> 
>>> Other ramblings...
>>>
>>> - Options used for debugging the installation process should be part
>>> of the public interface.  Custom installation modules should be able
>>> to hook into the debugging framework through a public interface.
>>>
>> Can you tell me more about what you're after?
> 
> Typical scenarios include:
> 
> 1) Jumsptart tells me it can't find a matching rule in rules.ok.  This
> may mean that it could not find the SjumpsCF directory, that I
> misspelled a hostname, or a host of other things.  To debug, today I
> have to exit the installation (ok) see what is mounted, run my custom
> "catdhcp" script to show me what DHCP really sent me.  Decoding vendor
> options using dhcpinfo is non-trivial.
> 

Yeah, we could easily do more here.  Used to be snoop was my primary 
debugger of this stuff, but it would be better not to have to go there.

> 2) When trying to debug jumpstart installations, often times it
> involves trying to understand what is really happening because there
> is not a lot of documentation about how jumpstart really works.  If
> jumpstart were not running, I would typically use tools like truss,
> snoop, etc. to figure out what is going on.  Obviously, Sun has seen
> this need too because in various Solaris releases you see other
> options in the code that parses the boot command line to take
> debugging arguments.  However, those are not documented and appear
> private.
> 

One of the benefits you'll get out of us doing the future work in the 
open is real design documents out on opensolaris.org, so you'll be able 
to dig in as deep as you like.  Not that we shouldn't also put effort 
into making it easy to understand without that level of effort, though.

> Tools that would be useful:
> 
> 1) Create an option that enables sshd during installation.

I think that's coming soon with one of the security projects.

> 2) Create boot options that specify the dtrace script to run during
> installation.
> 3) Include dtrace in the miniroot (haven't checked sparc, missing from 
> newboot)

Somebody else pointed this out to me a couple of weeks ago.  No reason 
we shouldn't have dtrace available and a way to auto-invoke a D script.

> 4) Consider having the ability to syslog installation progress.
> 

Reasonable suggestion.

> One thing that I have found very nice on with various Linux distros
> and Nexenta is that I have virtual consoles (or approximations
> thereof) that allow me to observe the installation process more than
> watching a progress bar.  This is very helpful when getting to know a
> new installer or debugging changes.

Would some ads during the install help? ;^)

Seriously, I hope we'll be bringing back virtual console support soon. 
It's much missed.

Thanks for all your comments, Mike.

Dave

Reply via email to