[caiman-discuss] derived profiles requirements

Michael Ramchand Mon, 11 May 2009 09:32:20 +0100

I have to say that I almost completely concur with Mike Gerdts here.

As one of the primary JET developers I've been continuously surprised 
(and continue to be continuously surprised) by the wide number of things 
that customers want to do with their builds. I've had to add hooks to 
JET in so many places to enable customers to insert/influence their own 
customised thing. Given that these customisations are so one-off, 
there's been no point in providing a JET variable or function to do it, 
because it would only ever be used once and by that customer. When asked 
for a feature, I've always had to make the decision: Is this a good 
general purpose RFE that I should incorporate into JET? Or is this a 
one-off that I simply need to facilitate with JET. For every 10 
requests, I guess only 1 or so ever make it as a JET feature (and that's 
being generous)


To try and be able to code and capture every single one of these within 
the manifest is folly, kind of like trying to count all the grains of 
sand on a beach. The response when we provide lists of "things" that we 
do seems to simply get added to the infinite list of "things AI could do 
in the manifest". You need to accept that there will ALWAYS be something 
that you/we have not thought of, and we MUST have a mechanism/hook to be 
able to override if required.

There's a balance between providing for the masses by presenting a 
limited number of best practice options and reducing the complexity of 
the install, which I think we MUST do, and enabling the expert user to 
configure down to the tiniest detail. Presenting too much functionality 
in the manifest may alienate some users.

I also agree with Peter Tribble, yes, most of the underlying Jumpstart 
code is old and difficult to maintain (well, it's not all that bad 
really), but we've come a long way, and Jumpstart is well understood by 
many of our customers, using protocols that are widely used by our 
customers.

So here's a novel idea, why don't we maintain the Jumpstart "API" at 
some level. Lets rewrite setup_install_server, and add_install_client 
and make them smarter. Let's remove bootp and standardise on dhcp 
builds. Let's remove the need for those special SUNW symbols for the 
SPARC builds like something more menu.lst.MACADDRESS like for SPARC). 
Let's remove the need for sysidcfg by incorporating it into the menu.lst 
thing somehow. Let's extend the profile so that we can do more within 
it. Let's simplify the rules file so that we only have one rule, but 
leverage a better "begin" scripts so that we can do derived profiles by 
default. (A good begin script obviates the needs for a rules file)

So from a user perspective, there's no new protocols, there's a couple 
of functionality extensions. Under the covers its all nice bright shiny 
new code, but the burden of learning a new tasks is removed.

Finally: We seem to be focussing AI on purely the O/S install bit. The 
funny thing is that the O/S install bit in Jumpstart is absolutely fine. 
Customers don't have a problem with that bit. The HARD part about 
Jumpstart is the bit AFTER the O/S is installed, when people want to 
deploy useful things like applications and other esoteric configuration 
parameters. You can't replace Jumpstart without providing the 
postinstall functionality, which is where all the work that the 
customers consider to be valuable gets done. There's 3 levels to Jumpstart:

1: Doing a net install, and running the interactive Solaris Installer. 
(add_install_client, provide mac address, IP address)

2: Non-interactive net-install: (add_install_client, provide mac 
address, IP address, rule pointing to profile, sysidcfg): Solaris 
installs unattended.

3: Non-interactive net-install: (add_install_client, provide mac, ip, 
rule pointing to profile and finish script, sysidcfg): Solaris installs 
unattended, plus installs all the cool and interesting stuff that 
customers actually want to run.

Obviously level 1 and level 2 are just building blocks to 3, but it 
seems that all the focus on AI is on Level 2, while the customer value 
actually derives from 3. If you look at JET, the SUNWjet package deals 
with level 2, all the other modules are the things that do all the level 
3 stuff.

(And Mike Gerdts, sorry if someone missold JET to you as a PS only 
thing, I've checked the archives of the external JET mailing list, and 
you never asked that list for anything.... we've given away pretty much 
every module once we knew it wouldn't create a support burden, and most 
of them are available in the sun.com/download bundle anyway)  
(http://wikis.sun.com/display/JET )

Mike

Sarah Jelinek wrote:
> Hi Mike,
>
> Thank you for responding.. more comments/questions inline..
>
>
>> On Fri, May 8, 2009 at 9:20 AM, Sarah Jelinek <Sarah.Jelinek at sun.com> 
>> wrote:
>>  
>>> Hi Mike,
>>>
>>> Thank you for this data! I do have some comments inline..
>>>    
>>>> I noticed in the AI Client Redesign Meeting Notes[1]:
>>>>
>>>> Then there was a discussion about Derived Profiles. The outcome was
>>>> to gather requirements around the following:
>>>>
>>>> | - What does deriving mean? That is, what aspects of the profile may
>>>> |   be derived? What problems are we trying to solve here?
>>>> | - Who derives a profile? Some clients or all the clients?
>>>> | - Should the client support substitution of certain fields in the
>>>> |   AI manifest? If yes, what problem will that solve?
>>>> | - How does the impact the criteria selection on the AI server?
>>>>
>>>> Currently I use derived profiles to do the following:
>>>>
>>>> - Customize partitioning based upon server model, disk size,
>>>>  memory size, etc.
>>>>
>>>>       
>>> Can you be specific about the criteria you feel are requirements? 
>>> What I
>>> mean by criteria is what things do you believe must be included so 
>>> that the
>>> client can probe and effectively create the correct derived profile?
>>>     
>>
>> Things that are in my current and/or begin scripts that derive the
>> profile include:
>>
>> - Always create / and alt-/ of the same size and on the same disks
>> - If enough disks are available, mirror everything
>> - If the disks are big enough, create / and alt-/ as X GB, else X/2 GB
>> - If running Solaris 10 or later, use leftover space for soft 
>> partitions.
>> - If running Solaris 9 or earlier, mount leftover space at /local
>> - If running on V240, V440, T2000, etc., root gets mirrored across
>> disks on the same controller
>> - If running on a 6800, 15K, 25K, etc., find the two JBODs that are
>> attached and mirror across them.
>> - If running on a Thumper, be sure to mirror across the devices that
>> the BIOS has access to
>> - If special device aliases (e.g. jsroot1, jsroot2) are found by
>> probing OBP, find the disks associated with them and install there
>> instead of using the rules above.
>> - Determine what site I am in (based on IP) and download the flash
>> archive from there
>>
>> Translated into the new way, this probably means:
>>
>> - Have the ability for the sysadmin - not the tool - to select which
>> disks to install onto.
>> - Provide a means that is flexible enough that disk selection can be
>> done by physical path, as I do above with jsroot*.  This is important
>> because the controller number can vary based on which PCI or PCIe
>> cards are installed.  I would hate to install Solaris onto SAN disks
>> (overwriting application data) when I meant to write to local disk.
>> - Have the ability to specify the size of rpool, which may be smaller
>> than a single disk.
>> - Have the ability to specify other zpools should reside
>> - Have the ability to tune the size and possibly location of swap &
>> dump.  That is, a system with small drives (old or SSD) might put swap
>> & dump in a separate pool - or may decide to use SVM for swap & dump
>> because ZFS increases the space requirements for them.
>> - Specify which mirror(s) or repository(s) to install from, based on
>> locally defined location rules.
>> - Specify proxy based on locally defined location rules.
>>   
> From the above, we can provide for all with our AI manifest 
> specification but:
> -Use physical path for disk specification
> -Allowing for specification of the rpool size
> -Allowing for other zpool creation
> -Swap and dump size specification
>
> I also read this from your description:
>
> -You want to be able to query controller types
> -You want to be able to query disk sizes and types
> -You want to be able to query system hardware, for type of machine you 
> are installing
> -You want to query network information
>>  
>>>> - Select the appropriate flash archive based on server model
>>>>  (primarily sun4u vs. sun4v vs i86pc)
>>>>
>>>> I have a lot of logic in finish scripts (JASS) and third-party system
>>>> management tools that does various other things based upon location
>>>> (derived from IP address), OS revision, and other criteria that is 
>>>> very
>>>> hard or impossible to acquire automatically.  Arguably, the bulk of 
>>>> JASS
>>>> is
>>>> legacy baggage with secure by default.
>>>>
>>>> As I look forward, I would like to derive profiles that:
>>>>
>>>> - Lays out storage properly.  The definition of "properly" will be 
>>>> likely
>>>>  be dependent on criteria that doesn't work for everyone.  That is, at
>>>>  MyCo we may boot from local disk and want two compressed mirrors.  At
>>>>  YourCo "properly" means to use the lowest numbered LUN presented via
>>>>  iSCSI from storage array X.
>>>> - Select software to install based on somewhat arbitrary rules.  
>>>> That is,
>>>>  at site X I need the omniback package and site Y I need 
>>>> netbackup.  If
>>>>  it's the primary ldom of a sun4v box, install LDoms Manager 2.4.
>>>>
>>>>       
>>> What types of data would drive the rules for the software choices?
>>>     
>>
>> The primary IP address along with a populated netmasks file and some
>> home-brew logic drives site identification.
>>
>> Probing OBP for aliases (prtpicl -c aliases -v) is great for
>> system-specific overrides.
>>
>>   
> Ok, this is one I didn't think of. Good to know.
>
>> Querying the network for which subnets are available shows some
>> promise (e.g. snooping for EIGRP packets on and seeing "VLAN#50
>> 10.0.50.0 - 224.0.0.10", or eventually LLDP) for making decisions.
>>
>>  
>>>> - Select repository (or mirror) based on location such that I don't
>>>> install
>>>>  across the Atlantic if I have a closer copy.
>>>> - Select repository based on location (lab installs experimental bits)
>>>> - Require production servers to have packages signed by the OS 
>>>> vendor or
>>>> by
>>>>  internal QA.  That is, make it impossible to install experimental
>>>>  third-party software on production.
>>>>
>>>>       
>>> How would we be able to determine it is a production server? I 
>>> assume the
>>> profile you would derive in this case would have its ips repo set for
>>> installation such that there wouldn't be experimental software. Is 
>>> this what
>>> you are thinking?
>>>     
>>
>> This would likely feed off of subnet-based rules.  Arguably, this is
>> probably more easily dealt with by having selecting a different base
>> installation profile (prod vs. lab) on the AI server.  The derived
>> profile would probably just tweak this base profile for
>> hardware-specific items and picking the closest appropriate repo.
>>
>>   
> Ok.
>>> A few more questions:
>>>
>>> 1. How easy is it for you to use, and configure your current jumpstart
>>> configuration to enable derived profiles? Are the user interfaces 
>>> easy to
>>> use?
>>>     
>>
>> The current setup of a jumpstart client involves (as a non-privileged
>> user), setting up a system-specific wanboot.conf and system.conf using
>> a fairly simple script.
>>
>> jumpstartzone$ /jumpstart/<release>/add_client_wanboot -e <mac> -h
>> <hostname> ...
>> Run this at the OpenBoot prompt:
>>     ok setenv network-boot-arguments=...
>>
>> ok setenv network-boot-arguments=...
>> ok boot net - install
>>
>> Every client uses the same begin script to derive the profile.  I
>> tweak the Begin/derive-profile.beg script when I do a new image
>> release (point it to the next flar) or something comes up that causes
>> other problems (like Solaris becomes huge and needs more than 8 GB for
>> /).
>>
>> The rules for site determination use a netmasks file and a "subnets" 
>> file. e.g.:
>>
>>     10.0.1.0 SiteA
>>     10.0.2.0 SiteB
>>
>> The rules for selecting installation disks require a diskmap file that
>> looks like:
>>
>> # 480R- note that they use a qlogic fiber channel chip just like our 
>> HBA's do
>> root1 SUNW,Sun-Fire-480R c.t0d0s2 
>> ../../devices/pci at 9,600000/SUNW,qlc at 2/fp at 0,0
>> root2 SUNW,Sun-Fire-480R c.t1d0s2 
>> ../../devices/pci at 9,600000/SUNW,qlc at 2/fp at 0,0
>>
>> # T5220
>> root1 SUNW,SPARC-Enterprise-T5220 c.t0d0s2
>> ../../devices/pci at 0/pci at 0/pci at 2/scsi at 0
>> root2 SUNW,SPARC-Enterprise-T5220 c.t1d0s2
>> ../../devices/pci at 0/pci at 0/pci at 2/scsi at 0
>> data1 SUNW,SPARC-Enterprise-T5220 c.t2d0s2
>> ../../devices/pci at 0/pci at 0/pci at 2/scsi at 0
>> data2 SUNW,SPARC-Enterprise-T5220 c.t3d0s2
>> ../../devices/pci at 0/pci at 0/pci at 2/scsi at 0
>>
>> If I just used the first two disks in $SI_DISK_LIST, the 480R may give
>> the disks I list above or something that is storing an oracle database
>> out on the SAN.  Best to avoid overwriting the database.
>>
>>  
>>> 2. What do you like about the way it is currently implemented?
>>>     
>>
>> - It works.
>> - I can trust that the just-hired-last-week junior sysadmin armed with
>> a simple procedure can install Solaris per standards without risk of
>> breaking the jumpstart environment for everyone else.  That is, since
>> there is no customization to perform on the jumpstart server there is
>> no chance that someone that is not tasked with maintaining jumpstart
>> will break jumpstart.
>> - Policy enforcement via scripting is much easier, accurate, and
>> cost-effective than policy enforcement via training, audits,
>> remediation, retraining, etc. (Sysadmins need to know what they are
>> doing, but need to be focused on value-add, not minutia.)
>>
>>   
>
> This is excellent data.
>>> 3. What don't you like?
>>>     
>>
>> - It took way too much work to make all of this reliable and workable
>> for a single process to use on a global basis.
>> - Making everything work equally well for network-based and DVD-based
>> installations was difficult.
>> - When things don't work (which is extremely rarely - see 2 above)
>> jumpstart is hard to debug because of lack of documented ways to
>> observe the process along with lack of a documented way to restart the
>> process without enduring POST, slow wanboot download, etc.  I know
>> many tricks to get around this, but learning them was painful and
>> often times only possible because of the extensive use of shell
>> scripts during installation.
>> - Automated installation has way too much of "every customer must
>> figure it out for themselves."
>>
>> It seems as though the last point is supposed to be addressed with
>> JASS and/or JET.  For me,
>> JASS (now unmaintained and not yet open source) was a big help for
>> automation of security hardening.  JET came to my attention long after
>> JASS was already working (including various custom written modules).
>> Almost every introduction I had to JET felt like a sales job for
>> professional services.  That is, the thing that I needed didn't come
>> with JET, but if I paid for some professional services they would
>> provide it.  Well, the thing that I needed was typically easier to
>> bolt onto JASS than it was to go through the requisition process for
>> professional services.
>>
>> Striking the balance between everyone having to figure it out for
>> themselves and lacking required flexibility is extremely difficult.  I
>> would prefer that we currently err in favor of giving too much
>> flexibility. Having flexibility will allow sysadmins to come up with
>> clever ways of accomplishing what they need to do, hopefully leading
>> to contributions of those clever things back to the community.  I
>> worry that lack of flexibility will hinder adoption at large sites -
>> all of which will suddenly sing the praises of jumpstart.
>>
>>   
> Fair points. And, really good insight in to the issues you encounter.
>
>
> Thank you again for the data. We are listening and will include this 
> input in to our process for redesign for AI.
>
> sarah
> *****
> _______________________________________________
> caiman-discuss mailing list
> caiman-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3249 bytes
Desc: S/MIME Cryptographic Signature
URL: 
<http://mail.opensolaris.org/pipermail/caiman-discuss/attachments/20090511/3dff15c7/attachment.bin>

[caiman-discuss] derived profiles requirements

Reply via email to