jan damborsky wrote: > I was investigating bug 8130 for a while in order to determine > what the problem is and if this might be considered as a stopper > for 2009.06 release. I would like to share my thoughts and observations, > since it seems that the problem is partially related to the chosen > implementation and at this point addressing it as a whole would be > too risky. > > Problem: > -------- > In current implementation, AI client boot process contains several > steps. Those interesting with respect to 8130 are > > [1] locating and downloading boot_archive > [2] locating and downloading additional compressed archives (solaris.zlib, > solarismisc.zlib). > > In current implementation, it is required that [1] and [2] are taken > from the same AI image. The issue here is that in specific configuration > affecting Sparc client, mismatch between [1] and [2] could occur > (boot_archive is taken from different AI image than compressed archives). > > For x86, this mismatch doesn't occur, since both locations are specified > at one place - (in GRUB menu.lst file) and are always updated at once. > > For Sparc, those locations are separated and there are scenarios when > they could currently become out of sync. Location of boot archive is > specified in wanboot.conf file (as 'root_file' option) and location of > compressed archives is provided as 'RootPath' option by DHCP server. > > The mismatch doesn't occur if AI Sparc client is explicitly associated > with given install service and AI image by using 'create-client' > installadm(1M) subcommand. In that case, both DHCP server as well as > wanboot.conf files are appropriately configured: > > * client specific DHCP macro containing location of compressed archives > is (re)created. It takes precedence over service specific DHCP macro. > It assures that client is always provided with correct 'RootPath' > information. > > * client specific wanboot.conf file containing location of boot_archive > is (re)created in /etc/netboot/<network_address>/<client_id> directory. > Again, it takes precedence over other wanboot.conf files stored in other > locations within /etc/netboot directory. > > The problematic scenario is when Sparc AI client is not explicitly > configured > with 'create-client' command. In that case, it is provided with > boot_archive > picked up from location specified in /etc/netboot/wanboot.conf file and > with RootPath option pointing to location of compressed archives which > is taken from service-specific DHCP macro. Those are configured when > 'create-service' > command is used to create install service. > > The problem is that /etc/netboot/wanboot.conf file is populated each time > new install service is created, but service-specific DHCP macro is assigned > to given pool of IP addresses (by calling pntadm(1M)) only when new pool > of IP addresses is asked to be created (by providing -i and -c options). > > e.g. the problem occurs when: > > [1] first install service is created along with pool of IP addresses > # installadm create-service -n service_1 -i <start_IP> -c <IP_pool_size> \ > -s <ai_iso_image_1> <ai_image_1> > > * /etc/netboot/wanboot.conf is created and points to boot_archive in > <ai_image_1> > > * service specific DHCP macro dhcp_macro_service_1 is created with > 'RootPath' pointing to <ai_image_1> > > * created IP addresses are assigned to dhcp_macro_service_1 macro > using pntadm(1M) command > > [2] second service is created > # installadm create-service -n service_2 -s <ai_iso_image_2> <ai_image_2> > > * /etc/netboot/wanboot.conf is (re)created and points to boot_archive in > <ai_image_2> > > * service specific DHCP macro dhcp_macro_service_2 is created with > 'RootPath' pointing to <ai_image_2>, but not associated with IP > addresses > > Now when Sparc AI client is booted, it picks up boot archive from > <ai_image_2> > and compressed archives from <ai_image_1> > > [3] second service is deleted along with AI image > # installadm delete-service -x service_2 > > * /etc/netboot/wanboot.conf is left untouched and points to boot_archive in > already deleted <ai_image_2> > > Now when Sparc AI client is asked to boot, it fails when trying to obtain > boot_archive. > > Proposed final solution: > ------------------------ > I think that the final solution here is to worked out set of requirements > we would like to address and reconsider existing design and implementation > with respect to > > * what are desired install service scopes to be available > - currently for Sparc we can either explicitly associate install > service with particular client (identifying it by MAC address) > and use another one for rest of Sparc clients. More than one > service can't be created serving broader scope, since only > one /etc/netboot/wanboot.conf file can be created. > > * how Sparc client obtains location of AI images > - now it is spread across two places - one for boot_archive, > one for compressed archives. It should be consolidated, so > that it is less error prone and easier to maintain. > > Proposed fix for now: > --------------------- > For now any significant design changes are not appropriate, > since they would be too risky. Based on this I am thinking about > following temporary solution before final approach can be taken: > > * when new service is created, don't touch /etc/netboot/wanboot.conf > if it contains pointer to existing boot archive. It makes sure > that once /etc/netboot/wanboot.conf is created for one service, > it is not accidentaly overwritten by another service. So clients would > continue to use first service as a default (in cases 'create-client' > is not called) and mismatch would be avoided in this case. > > * when service is deleted along with associated AI image (by passing > '-x' option) and if /etc/netboot/wanboot.conf file contains pointer > to boot archive in that image, /etc/netboot/wanboot.conf will be > deleted along with that AI image. It would avoid > /etc/netboot/wanboot.conf pointing to non-existent AI image. > > When those changes are applied, behavior for Sparc clients would be similar > to the one for x86 clients. > > I have prepared preliminary fix with those changes and tested it for > Sparc as well as x86 clients. > > The preliminary webrev is available at following location: > http://cr.opensolaris.org/~dambi/bug-8130/ > > please let me know, if you think that this problem can be qualified > as stopper for 2009.06, if there are other related issues I have > not noticed and if solution mentioned above can be acceptable > or different approach should be taken. Any comments are highly appreciated. > > Thank you very much in advance, > Jan > > _______________________________________________ > caiman-discuss mailing list > caiman-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
Jan, Thanks for looking into this and the great description! Please forgive me if my perspective is not correct. I'm learning how this works as I go. Until the existing design can be reworked, it seems to me a safer approach might be to not allow a consecutive invocation of "installadm create-service" on SPARC. If a SPARC user issues "installadm create-service" and a service is already created issue an error message that only one service is currently allowed on SPARC and "installadm delete-service" must be run prior to creating a second service. OK I can imagine this is not flexible for customers but it might be a safe approach to help avoid customer problems/confusion until the design can be reworked. Again I apologies if my perspective is naive. I'm just thinking here and trying to help. Joe