Hi Sarah,
On 04/16/09 16:40, Sarah Jelinek wrote: > Hi Jan, > > Great writeup of the issues! Thank you for taking the time to > investigate this so thoroughly. You are quite welcome ! > I have some comments/questions inline.. Please see my response in line. Thank you for your comments ! Jan >> >> >> x86 - first created service has 'default' scope >> ----------------------------------------------- >> When local DHCP server is to be used, the first install service is >> created along with -i -c options and IP pool is associated with >> service-specific DHCP macro. Thus it is used as default, since >> client loads menu.lst file defined in service-specific macro. >> >> Any subsequent service (if created without -i -c) doesn't update IP pool >> by calling pntadm(1M) leaving the first service used as 'default'. >> >> Issues: >> - It is a little bit problematic to change the default service >> (e.g. delete old/create new), since appropriate IP pool has to be >> updated with new dhcp service-specific macro. This is now only done >> if '-i -c' options are provided and user has to know which IP pool >> is to be updated. > So, when we delete a service, we don't delete the service specific > macro on the dhcp server? So, this data for this service is still kept? Yes - this issue is tracked by bug 4526. >> >> - If default service is deleted, client is still served with menu.lst >> file referring to it, as IP pool is still associated with service >> specific DHCP macro. >> > We don't delete the dchp macro when we delete the service? Nope - please see above. > Isn't the image gone at this point though, so even if the client is > served the menu.lst it will fail? It will fail in different phases based on how service is deleted: [1] image is deleted along with service # installadm delete-service -x <service_name> AI client fails in GRUB when trying to obtain kernel&boot archive [2] only service is deleted, image is left on system (might be in use by other service) # installadm delete-service <service_name> AI client fails in service discovery phase when trying to locate already deleted service <service_name>. >> - First one is always set as a default - this could be addressed by >> explicitly specifying which one should be set by default (e.g. >> by providing '-d' option as suggested). >> >> >> Sparc - last created service has 'default' scope >> ------------------------------------------------ >> The reason is that each time new service is created, >> /etc/netboot/wanboot.conf is updated. Specifying -i -c doesn't >> affect this behavior, since DHCP server would be queried only for >> network info (client IP, server IP, ...) and boot file (which is >> wanboot-cgi common for all clients). Once [A] is fixed, 'RootPath' >> option would be no longer utilized. >> >> Issues: >> - Last one is always set as default - this could be addressed by >> explicitly specifying which one should be set by default (e.g. >> by providing '-d' option as suggested). >> >> I think we would need to clarify >> * desired behavior for [2] along with corner cases and possible states > A couple of questions: > > 1. If we used -d, could we update the ip pool for the user in the case > of x86? Do we have the data to do this? Seems like -d for sparc is > straightforward. Not so much for x86. I agree it is straightforward for Sparc, since we don't need to update IP pool once it is created. I am not sure if there is an easy way to accomplish this for x86. As far as I am aware of we don't save information about IP pools created and how they are associated with given service. My current understanding is that if we want to get rid of macros associated with IP pool created for given service, we would need to destroy IP pool as well, since 'macro' field for particular IP can't be left empty. But I might be wrong in this point. We might think about some brute force approach for now like unconditionally update everything, but I am not sure if it is feasible solution. > > > 2. If the dhcp server is on a separate machine, we do not currently > update the macro or pool data, correct? Nope. We only print what should be done and user is expected to take care of this. > > These are some issues I see with default behavior that we need to > think about(off the top of my head): > > -We need to ensure that we clean up everything associated with a > default service, IP pool, wanboot.conf, images, etc > I am not sure what the state of our cleanup code in installadm is when > we delete a service. Agreed. Some stuff is cleaned up, but not all - please see bugs 4526, 8198 which should reflect current state of things. > > -If we cannot do the correct cleanup for a deleted default service, we > fail, and don't allow the user to create another default. Not sure how > we would note this 'failure' and keep track of this so we could fail > the creating of another default service. Currently, state of service is captured in its SMF service property called 'status'. We now recognize 'on', 'off', I think we might add more to reflect other possible states of install service, like 'degraded' if delete operation fail. Then I think we also need to work out some mechanism, how to clean up those services. > > -If the dhcp server is on a different machine than the install server, > I think we run in to issues if we rely on the users to delete the > associated macros. We cannot enforce this at this time. Agreed. > > -If we implement -d(default) for this release, what are the > implications of using the new command on services setup prior to use > actually explicitly defining a default service? I think it depends on level of incompatibilities which would be introduced. I think we will realize it once we dig into the more detail design. > > -My take on what is desired behavior and attributes for a 'default' > service(high level thoughts): > > 1. It has all components necessary for the specific architecture > defined. > > x86: appropriate service specific dhcp_macro and ip pools setup > on dhcp server(even if it is a remote server), complete service > image > sparc: appropriate wanboot.conf data available in > /etc/netboot/wanboot.conf, service specific dhcp macro setup on > dhcp server, complete service image For Sparc, macro in IP pool is actually not service specific, as it will not contain any service specific information once 8130 is fixed. So for Sparc we will not have the requirement to update IP pool each time default service is changed. > > 2. There is only 1 default service for each supported architecture > at any time on an install server. This is a good point - theoretically one default service could serve both architectures, but I am not sure if this actually works in current implementation, I mean if AI images for both x86 as well as Sparc can be associated with one service. > And, the service must explicitly > be setup as the 'default' service via installadm -d. > > This begs the question.. is it required to have a default > service on an install server for each architecture? > In the case > of sparc if we don't have a 'default', this is problematic for > all sparc clients, right? Even those that are setup with > create-client? No, 'default' service is not required for create-client working correctly for Sparc, any service can be associated with Sparc client. > > 3. If we cannot setup all components of a default service, we fail. > Or we provide an override flag if user wants to go ahead and have us > setup what we can. > > We need to figure out what is allowable as user controlled setup > in this scenario. > > I'll think on this more and reply again. > >> * what is appropriate/desirable to address for the current release >> > Based on the question above, it is clear we have a lot of thinking to > do about the design of a 'default' service. What I think we can do for > this release is the following: > > 1. Fix the mismatch issue for sparc, perhaps as you have defined > above with the wanboot changes. Agreed - I am working on putting fix together and investigating if it works as assumed. > > 2. With regard to the -d, explicit setting of the default service I > think we have a couple of choices: > > -We do not implement -d. I think the design of this is going to > take some careful thought. Agreed. > > Part of the problem we have now is that the design doesn't > hold for default services. And, we are missing some key > functionality, such as setup of dhcp macros and pools on > remote dhcp servers. We allow the implicit behavior that is > currently in the implementation to stand. > > or.. we implement -d with the bare minimum. What I consider bare > minimum is: > > -We overwrite /etc/netboot/wanboot.conf for sparc > > -We cleanup the dhcp macro and ip pool for the users on x86 > if we can. If not, we output messages telling them what they > must do. > > -We ensure that when we delete a service we delete all the > pieces we need to so we don't get clients booting from old > menu.lst files or wanboot.conf. If we cannot delete all the > pieces we output messages telling them what they must do. > > 3. We update the docs and manpage to explicitly let users know: > > If we do not implement -d: > > -The last sparc service created on an install service is the > default > -The first x86 service created on an install server is the > default > -to change x86 they need to ensure the dhcp macro and IP > pool data is updated > -If they don't want to use whatever 'default' is on the > server, they must explicitly do a create-client > > Do you have any sense of the amount of work it would take to do the > bare minimum for the -d support? Do you have any ideas on what you think the bare minimum might be? I concur with what you are proposing with respect to the minimum we need to make 'default' work in feasible way: * fix 'clean up part' (bug 4526), so that AI server and AI images are not confused by leftovers (invalid menu.lst, dhcp macros, wanboot.conf, ...). Not sure right now how much work is involved, but I think it might not be easy, since other issues might occur when digging into this. * determine, if we need 'default' for each architecture. If it is needed, this might introduce other things to think about - like if we have two defaults, do we need to completely separate them - e.g. do we also need two separate IP pools ? How we identify which one is for Sparc, x86 ? ... Also something to be need investigated, not sure how long it might take - it depends on the outcome of this investigation. * '-d' support for Sparc - populate /etc/netboot/wanboot.conf only if '-d' is specified. It is straightforward and easy to implement. * '-d' support for x86 - not sure about work involved - needs to be investigated. In general, my current feeling is that it is significant amount of work, since basically we need to work out design which will be compatible with final long term solution, pick up subset of features and implement them. During that process we can encounter various issues we are currently not aware of. > We have to try to be sure whatever bare minimum we implement, if we > decide to go this way, doesn't cause other design issues later. Agreed - I think it would mean that we should work out complete design as detailed as necessary and implement the feasible subset for now.