[caiman-discuss] derived profiles requirements

Sarah Jelinek Mon, 11 May 2009 07:55:20 -0600

Hi Michael,


> I have to say that I almost completely concur with Mike Gerdts here.
>
> As one of the primary JET developers I've been continuously surprised 
> (and continue to be continuously surprised) by the wide number of 
> things that customers want to do with their builds. I've had to add 
> hooks to JET in so many places to enable customers to insert/influence 
> their own customised thing. Given that these customisations are so 
> one-off, there's been no point in providing a JET variable or function 
> to do it, because it would only ever be used once and by that 
> customer. When asked for a feature, I've always had to make the 
> decision: Is this a good general purpose RFE that I should incorporate 
> into JET? Or is this a one-off that I simply need to facilitate with 
> JET. For every 10 requests, I guess only 1 or so ever make it as a JET 
> feature (and that's being generous)
>
> To try and be able to code and capture every single one of these 
> within the manifest is folly, kind of like trying to count all the 
> grains of sand on a beach. The response when we provide lists of 
> "things" that we do seems to simply get added to the infinite list of 
> "things AI could do in the manifest". You need to accept that there 
> will ALWAYS be something that you/we have not thought of, and we MUST 
> have a mechanism/hook to be able to override if required.
Actually, I am not saying that everything Mike or others have indicated 
they use or need could be done in the AI manifest. My asking the 
questions is about gather data, requirements and understanding of how 
the current jumpstart is used today. This is important information. As a 
matter of fact, much of discussion about derived profiles and how Mike 
uses them isn't about the AI manifest implementation.

I agree that there will always be something we have not thought of that 
customers want.
>
>
> There's a balance between providing for the masses by presenting a 
> limited number of best practice options and reducing the complexity of 
> the install, which I think we MUST do, and enabling the expert user to 
> configure down to the tiniest detail. Presenting too much 
> functionality in the manifest may alienate some users.

Agreed. We have certainly tried to trim down our installers for good 
reason. Our past practice was to add new functionality to the installer 
every time something new was available for configuration. Now though, 
with smf enhanced profiles coming on board, nwam and other system 
configuration utilities we are trying to figure out how to serve the 
requirements of our users using current technology we offer in 
OpenSolaris without putting another 'wart' on the installer.

>
> I also agree with Peter Tribble, yes, most of the underlying Jumpstart 
> code is old and difficult to maintain (well, it's not all that bad 
> really), but we've come a long way, and Jumpstart is well understood 
> by many of our customers, using protocols that are widely used by our 
> customers.
>
> So here's a novel idea, why don't we maintain the Jumpstart "API" at 
> some level. Lets rewrite setup_install_server, and add_install_client 
> and make them smarter. Let's remove bootp and standardise on dhcp 
> builds. Let's remove the need for those special SUNW symbols for the 
> SPARC builds like something more menu.lst.MACADDRESS like for SPARC). 
> Let's remove the need for sysidcfg by incorporating it into the 
> menu.lst thing somehow. Let's extend the profile so that we can do 
> more within it. Let's simplify the rules file so that we only have one 
> rule, but leverage a better "begin" scripts so that we can do derived 
> profiles by default. (A good begin script obviates the needs for a 
> rules file)
>
> So from a user perspective, there's no new protocols, there's a couple 
> of functionality extensions. Under the covers its all nice bright 
> shiny new code, but the burden of learning a new tasks is removed.
>

We do understand that the transition to a new technology like AI 
requires customer buy-in and that we must provide a migration plan and 
ensure that deployment of AI in their current setup must be easy.

The truth is we believe that with the new technology OpenSolaris offers 
allows us to do a better job with automated installation. I hear you 
with regard to keeping the 'front-end' of jumpstart and whacking the 
back end. The issues we found with the current add_install_client and 
setup_install_server were not just cleanup. Even if we tried to do 
something like this, customers would have a fairly large transition to 
our new shiny front end. It wouldn't just be functionality extensions.


> Finally: We seem to be focussing AI on purely the O/S install bit. The 
> funny thing is that the O/S install bit in Jumpstart is absolutely 
> fine. Customers don't have a problem with that bit. The HARD part 
> about Jumpstart is the bit AFTER the O/S is installed, when people 
> want to deploy useful things like applications and other esoteric 
> configuration parameters. You can't replace Jumpstart without 
> providing the postinstall functionality, which is where all the work 
> that the customers consider to be valuable gets done. There's 3 levels 
> to Jumpstart:

We are not only focusing on the O/S install bit for AI. This discussion 
regarding derived profiles focuses on that because that's the problem 
space. But, we are looking at all aspects of AI, and I mean all.

>
> 1: Doing a net install, and running the interactive Solaris Installer. 
> (add_install_client, provide mac address, IP address)
>
> 2: Non-interactive net-install: (add_install_client, provide mac 
> address, IP address, rule pointing to profile, sysidcfg): Solaris 
> installs unattended.
>
> 3: Non-interactive net-install: (add_install_client, provide mac, ip, 
> rule pointing to profile and finish script, sysidcfg): Solaris 
> installs unattended, plus installs all the cool and interesting stuff 
> that customers actually want to run.
>

There is one other level, setup_install_server. For AI we have two 
levels client and server at this point. We do not have interactive 
installation for AI.

I get your point though. We have broken down the redesign effort for AI 
in to parts to help enable us to understand the issues. Sometimes the 
issues get glommed together. For example, right now on the server side 
of AI we have made an install 'service' be about the actual boot image, 
webserver setup, dns-sd, dhcp, ... This has caused us issues with the 
design.

You will see other emails about design and redesign of the other pieces 
of AI.


Thank you for your time and for you insights.

Regards,
sarah
****

> Obviously level 1 and level 2 are just building blocks to 3, but it 
> seems that all the focus on AI is on Level 2, while the customer value 
> actually derives from 3. If you look at JET, the SUNWjet package deals 
> with level 2, all the other modules are the things that do all the 
> level 3 stuff.
>
> (And Mike Gerdts, sorry if someone missold JET to you as a PS only 
> thing, I've checked the archives of the external JET mailing list, and 
> you never asked that list for anything.... we've given away pretty 
> much every module once we knew it wouldn't create a support burden, 
> and most of them are available in the sun.com/download bundle anyway)  
> (http://wikis.sun.com/display/JET )
>
> Mike
>
> Sarah Jelinek wrote:
>> Hi Mike,
>>
>> Thank you for responding.. more comments/questions inline..
>>
>>
>>> On Fri, May 8, 2009 at 9:20 AM, Sarah Jelinek 
>>> <Sarah.Jelinek at sun.com> wrote:
>>>  
>>>> Hi Mike,
>>>>
>>>> Thank you for this data! I do have some comments inline..
>>>>   
>>>>> I noticed in the AI Client Redesign Meeting Notes[1]:
>>>>>
>>>>> Then there was a discussion about Derived Profiles. The outcome was
>>>>> to gather requirements around the following:
>>>>>
>>>>> | - What does deriving mean? That is, what aspects of the profile may
>>>>> |   be derived? What problems are we trying to solve here?
>>>>> | - Who derives a profile? Some clients or all the clients?
>>>>> | - Should the client support substitution of certain fields in the
>>>>> |   AI manifest? If yes, what problem will that solve?
>>>>> | - How does the impact the criteria selection on the AI server?
>>>>>
>>>>> Currently I use derived profiles to do the following:
>>>>>
>>>>> - Customize partitioning based upon server model, disk size,
>>>>>  memory size, etc.
>>>>>
>>>>>       
>>>> Can you be specific about the criteria you feel are requirements? 
>>>> What I
>>>> mean by criteria is what things do you believe must be included so 
>>>> that the
>>>> client can probe and effectively create the correct derived profile?
>>>>     
>>>
>>> Things that are in my current and/or begin scripts that derive the
>>> profile include:
>>>
>>> - Always create / and alt-/ of the same size and on the same disks
>>> - If enough disks are available, mirror everything
>>> - If the disks are big enough, create / and alt-/ as X GB, else X/2 GB
>>> - If running Solaris 10 or later, use leftover space for soft 
>>> partitions.
>>> - If running Solaris 9 or earlier, mount leftover space at /local
>>> - If running on V240, V440, T2000, etc., root gets mirrored across
>>> disks on the same controller
>>> - If running on a 6800, 15K, 25K, etc., find the two JBODs that are
>>> attached and mirror across them.
>>> - If running on a Thumper, be sure to mirror across the devices that
>>> the BIOS has access to
>>> - If special device aliases (e.g. jsroot1, jsroot2) are found by
>>> probing OBP, find the disks associated with them and install there
>>> instead of using the rules above.
>>> - Determine what site I am in (based on IP) and download the flash
>>> archive from there
>>>
>>> Translated into the new way, this probably means:
>>>
>>> - Have the ability for the sysadmin - not the tool - to select which
>>> disks to install onto.
>>> - Provide a means that is flexible enough that disk selection can be
>>> done by physical path, as I do above with jsroot*.  This is important
>>> because the controller number can vary based on which PCI or PCIe
>>> cards are installed.  I would hate to install Solaris onto SAN disks
>>> (overwriting application data) when I meant to write to local disk.
>>> - Have the ability to specify the size of rpool, which may be smaller
>>> than a single disk.
>>> - Have the ability to specify other zpools should reside
>>> - Have the ability to tune the size and possibly location of swap &
>>> dump.  That is, a system with small drives (old or SSD) might put swap
>>> & dump in a separate pool - or may decide to use SVM for swap & dump
>>> because ZFS increases the space requirements for them.
>>> - Specify which mirror(s) or repository(s) to install from, based on
>>> locally defined location rules.
>>> - Specify proxy based on locally defined location rules.
>>>   
>> From the above, we can provide for all with our AI manifest 
>> specification but:
>> -Use physical path for disk specification
>> -Allowing for specification of the rpool size
>> -Allowing for other zpool creation
>> -Swap and dump size specification
>>
>> I also read this from your description:
>>
>> -You want to be able to query controller types
>> -You want to be able to query disk sizes and types
>> -You want to be able to query system hardware, for type of machine 
>> you are installing
>> -You want to query network information
>>>  
>>>>> - Select the appropriate flash archive based on server model
>>>>>  (primarily sun4u vs. sun4v vs i86pc)
>>>>>
>>>>> I have a lot of logic in finish scripts (JASS) and third-party system
>>>>> management tools that does various other things based upon location
>>>>> (derived from IP address), OS revision, and other criteria that is 
>>>>> very
>>>>> hard or impossible to acquire automatically.  Arguably, the bulk 
>>>>> of JASS
>>>>> is
>>>>> legacy baggage with secure by default.
>>>>>
>>>>> As I look forward, I would like to derive profiles that:
>>>>>
>>>>> - Lays out storage properly.  The definition of "properly" will be 
>>>>> likely
>>>>>  be dependent on criteria that doesn't work for everyone.  That 
>>>>> is, at
>>>>>  MyCo we may boot from local disk and want two compressed 
>>>>> mirrors.  At
>>>>>  YourCo "properly" means to use the lowest numbered LUN presented via
>>>>>  iSCSI from storage array X.
>>>>> - Select software to install based on somewhat arbitrary rules.  
>>>>> That is,
>>>>>  at site X I need the omniback package and site Y I need 
>>>>> netbackup.  If
>>>>>  it's the primary ldom of a sun4v box, install LDoms Manager 2.4.
>>>>>
>>>>>       
>>>> What types of data would drive the rules for the software choices?
>>>>     
>>>
>>> The primary IP address along with a populated netmasks file and some
>>> home-brew logic drives site identification.
>>>
>>> Probing OBP for aliases (prtpicl -c aliases -v) is great for
>>> system-specific overrides.
>>>
>>>   
>> Ok, this is one I didn't think of. Good to know.
>>
>>> Querying the network for which subnets are available shows some
>>> promise (e.g. snooping for EIGRP packets on and seeing "VLAN#50
>>> 10.0.50.0 - 224.0.0.10", or eventually LLDP) for making decisions.
>>>
>>>  
>>>>> - Select repository (or mirror) based on location such that I don't
>>>>> install
>>>>>  across the Atlantic if I have a closer copy.
>>>>> - Select repository based on location (lab installs experimental 
>>>>> bits)
>>>>> - Require production servers to have packages signed by the OS 
>>>>> vendor or
>>>>> by
>>>>>  internal QA.  That is, make it impossible to install experimental
>>>>>  third-party software on production.
>>>>>
>>>>>       
>>>> How would we be able to determine it is a production server? I 
>>>> assume the
>>>> profile you would derive in this case would have its ips repo set for
>>>> installation such that there wouldn't be experimental software. Is 
>>>> this what
>>>> you are thinking?
>>>>     
>>>
>>> This would likely feed off of subnet-based rules.  Arguably, this is
>>> probably more easily dealt with by having selecting a different base
>>> installation profile (prod vs. lab) on the AI server.  The derived
>>> profile would probably just tweak this base profile for
>>> hardware-specific items and picking the closest appropriate repo.
>>>
>>>   
>> Ok.
>>>> A few more questions:
>>>>
>>>> 1. How easy is it for you to use, and configure your current jumpstart
>>>> configuration to enable derived profiles? Are the user interfaces 
>>>> easy to
>>>> use?
>>>>     
>>>
>>> The current setup of a jumpstart client involves (as a non-privileged
>>> user), setting up a system-specific wanboot.conf and system.conf using
>>> a fairly simple script.
>>>
>>> jumpstartzone$ /jumpstart/<release>/add_client_wanboot -e <mac> -h
>>> <hostname> ...
>>> Run this at the OpenBoot prompt:
>>>     ok setenv network-boot-arguments=...
>>>
>>> ok setenv network-boot-arguments=...
>>> ok boot net - install
>>>
>>> Every client uses the same begin script to derive the profile.  I
>>> tweak the Begin/derive-profile.beg script when I do a new image
>>> release (point it to the next flar) or something comes up that causes
>>> other problems (like Solaris becomes huge and needs more than 8 GB for
>>> /).
>>>
>>> The rules for site determination use a netmasks file and a "subnets" 
>>> file. e.g.:
>>>
>>>     10.0.1.0 SiteA
>>>     10.0.2.0 SiteB
>>>
>>> The rules for selecting installation disks require a diskmap file that
>>> looks like:
>>>
>>> # 480R- note that they use a qlogic fiber channel chip just like our 
>>> HBA's do
>>> root1 SUNW,Sun-Fire-480R c.t0d0s2 
>>> ../../devices/pci at 9,600000/SUNW,qlc at 2/fp at 0,0
>>> root2 SUNW,Sun-Fire-480R c.t1d0s2 
>>> ../../devices/pci at 9,600000/SUNW,qlc at 2/fp at 0,0
>>>
>>> # T5220
>>> root1 SUNW,SPARC-Enterprise-T5220 c.t0d0s2
>>> ../../devices/pci at 0/pci at 0/pci at 2/scsi at 0
>>> root2 SUNW,SPARC-Enterprise-T5220 c.t1d0s2
>>> ../../devices/pci at 0/pci at 0/pci at 2/scsi at 0
>>> data1 SUNW,SPARC-Enterprise-T5220 c.t2d0s2
>>> ../../devices/pci at 0/pci at 0/pci at 2/scsi at 0
>>> data2 SUNW,SPARC-Enterprise-T5220 c.t3d0s2
>>> ../../devices/pci at 0/pci at 0/pci at 2/scsi at 0
>>>
>>> If I just used the first two disks in $SI_DISK_LIST, the 480R may give
>>> the disks I list above or something that is storing an oracle database
>>> out on the SAN.  Best to avoid overwriting the database.
>>>
>>>  
>>>> 2. What do you like about the way it is currently implemented?
>>>>     
>>>
>>> - It works.
>>> - I can trust that the just-hired-last-week junior sysadmin armed with
>>> a simple procedure can install Solaris per standards without risk of
>>> breaking the jumpstart environment for everyone else.  That is, since
>>> there is no customization to perform on the jumpstart server there is
>>> no chance that someone that is not tasked with maintaining jumpstart
>>> will break jumpstart.
>>> - Policy enforcement via scripting is much easier, accurate, and
>>> cost-effective than policy enforcement via training, audits,
>>> remediation, retraining, etc. (Sysadmins need to know what they are
>>> doing, but need to be focused on value-add, not minutia.)
>>>
>>>   
>>
>> This is excellent data.
>>>> 3. What don't you like?
>>>>     
>>>
>>> - It took way too much work to make all of this reliable and workable
>>> for a single process to use on a global basis.
>>> - Making everything work equally well for network-based and DVD-based
>>> installations was difficult.
>>> - When things don't work (which is extremely rarely - see 2 above)
>>> jumpstart is hard to debug because of lack of documented ways to
>>> observe the process along with lack of a documented way to restart the
>>> process without enduring POST, slow wanboot download, etc.  I know
>>> many tricks to get around this, but learning them was painful and
>>> often times only possible because of the extensive use of shell
>>> scripts during installation.
>>> - Automated installation has way too much of "every customer must
>>> figure it out for themselves."
>>>
>>> It seems as though the last point is supposed to be addressed with
>>> JASS and/or JET.  For me,
>>> JASS (now unmaintained and not yet open source) was a big help for
>>> automation of security hardening.  JET came to my attention long after
>>> JASS was already working (including various custom written modules).
>>> Almost every introduction I had to JET felt like a sales job for
>>> professional services.  That is, the thing that I needed didn't come
>>> with JET, but if I paid for some professional services they would
>>> provide it.  Well, the thing that I needed was typically easier to
>>> bolt onto JASS than it was to go through the requisition process for
>>> professional services.
>>>
>>> Striking the balance between everyone having to figure it out for
>>> themselves and lacking required flexibility is extremely difficult.  I
>>> would prefer that we currently err in favor of giving too much
>>> flexibility. Having flexibility will allow sysadmins to come up with
>>> clever ways of accomplishing what they need to do, hopefully leading
>>> to contributions of those clever things back to the community.  I
>>> worry that lack of flexibility will hinder adoption at large sites -
>>> all of which will suddenly sing the praises of jumpstart.
>>>
>>>   
>> Fair points. And, really good insight in to the issues you encounter.
>>
>>
>> Thank you again for the data. We are listening and will include this 
>> input in to our process for redesign for AI.
>>
>> sarah
>> *****
>> _______________________________________________
>> caiman-discuss mailing list
>> caiman-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
>

[caiman-discuss] derived profiles requirements

Reply via email to