Hi Sarah,
Sarah Jelinek wrote: > Hi Jan, > > >> Hi Sarah, >> >> >> Sarah Jelinek wrote: >>> Hi Jan, >>> >>> >>> >>>> Hi Sarah, >>>> >>>> when I was doing initial investigation, it seemed to me that >>>> at the moment useradd fails the system is left in inconsistent >>>> state because of steps which happened before useradd was invoked. >>>> This is the reason, why I thought we would need to clean up >>>> the target before installation is restarted after useradd failure. >>>> >>>> But then I found out that target instantiation (TI) and transfer (TM) >>>> phases are actually started after useradd finishes and that the >>>> inconsistency was caused by the fact that orchestrator proceeded >>>> with installation even if useradd failed. >>>> >>>> It means that if orchestrator returns with failure immediately >>>> when it finds out that useradd wasn't successful and doesn't start >>>> TI nor TM, no actual changes are done to the target, so no cleanup >>>> needs to be done and install can be restarted successfully. >>>> >>>> So this fix doesn't address the case when target is in inconsistent >>>> state before installer is invoked - for example if it would fail >>>> for some reason during TI or TM phase and thus target would be >>>> already instantiated and some bits already transfered. >>>> >>>> There is still possibility (not addressed by this bug) that >>>> for some reason installer crashes during TI or TM and then >>>> the system might be left in state which would prevent user >>>> from restarting installer successfully - the question might >>>> be if there are valid scenarios when restarting installer >>>> would help to solve the underlying issue in these cases. >>>> Do you think that these possibilities should be more >>>> investigated ? >>>> >>> I ran in to these cases myself while debugging and fixing bug 533. >>> Basically if an install fails in the middle for some reason even user >>> error the user cannot restart the installer unless they cleanup the >>> left over targets. That seems broken to me. >> I see - thinking more about that particular scenario I agree with you >> installer should be able to deal with this situation in some way. >> >>> The fix you put in for this bug and the fix I will put in for 533 >>> will stop the orchestrator from continuing on in the event of >>> failures prior to starting transfer. >> Agreed. >> >>> My concern about not being able to restart the installer is that with >>> the livecd environment in particular, we don't disable the installer >>> icon or program after a failed installation attempt. We tell them >>> that TI failed, but we don't tell them how to fix it except they can >>> reboot. >>> >>> We have a few choices with this(IMO): >>> >>> 1. Generate better error messages regarding TI failure, and how to clean >>> up the leftover configuration. >>> 2. Disable the installer icon until the user cleans up the stuff >>> causing the failures. This is likely harder to do than might seem >>> obvious since how are we going to be able to track what they have >>> done to clean things up. >>> 3. Modify TI to recreate unconfigure any existing configuration and >>> recreate the zpool/zfs datasets if doing an initial install. This >>> means the installer won't fail due to this. >> I have taken a look at TI part and played for a while with the installer. >> After invoking it from command line, I let him finish TI and start TM. >> Then I interrupted the install process and tried to clean up the system, >> so that installer could be restarted. >> >> So far it seems to me that "zpool destroy -f rpool" could take care of >> cleaning up everything related to TI ZFS targets (root pool, datasets, >> ...), >> which should be sufficient for TI to allow successfully instantiate >> target >> again. > I will file a bug on this. I am going to mark this a stopper so it will > need to be fixed for the May release. I can help with this if you need. > I will be assigning you as the RE as long as you are ok with that. If > not, feel free to change this. I have just accepted the bug. So far it seems to me that TI cleaning up might be straightforward - but I am going to do deeper investigation. Thank you for offering me the help ! >> I am not sure if there is anything we should be aware regarding >> TM part - I am CCing Moinak, who might be able to provide this >> information. > We do need to evaluate the robustness of the transfermod for restarts. Agreed - I will follow up with Moinak on this. >> Also, we probably would need to think about changes done by >> orchestrator - >> for example I have noticed that I can't create the same user if I restart >> installer - not sure if something else in the system might be affected >> as well. >> > Ah.. that's true. Because we have already added the user. I think we > need to understand what the errors could be through the installation > processing. I will file a bug on the orchestrator robustness and assign > myself as the RE. Thank you very much, Jan > > thanks, > sarah > _______________________________________________ > caiman-discuss mailing list > caiman-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
