Hi Jan,
> Hi Sarah, > > > Sarah Jelinek wrote: >> Hi Jan, >> >> >> >>> Hi Sarah, >>> >>> when I was doing initial investigation, it seemed to me that >>> at the moment useradd fails the system is left in inconsistent >>> state because of steps which happened before useradd was invoked. >>> This is the reason, why I thought we would need to clean up >>> the target before installation is restarted after useradd failure. >>> >>> But then I found out that target instantiation (TI) and transfer (TM) >>> phases are actually started after useradd finishes and that the >>> inconsistency was caused by the fact that orchestrator proceeded >>> with installation even if useradd failed. >>> >>> It means that if orchestrator returns with failure immediately >>> when it finds out that useradd wasn't successful and doesn't start >>> TI nor TM, no actual changes are done to the target, so no cleanup >>> needs to be done and install can be restarted successfully. >>> >>> So this fix doesn't address the case when target is in inconsistent >>> state before installer is invoked - for example if it would fail >>> for some reason during TI or TM phase and thus target would be >>> already instantiated and some bits already transfered. >>> >>> There is still possibility (not addressed by this bug) that >>> for some reason installer crashes during TI or TM and then >>> the system might be left in state which would prevent user >>> from restarting installer successfully - the question might >>> be if there are valid scenarios when restarting installer >>> would help to solve the underlying issue in these cases. >>> Do you think that these possibilities should be more >>> investigated ? >>> >> I ran in to these cases myself while debugging and fixing bug 533. >> Basically if an install fails in the middle for some reason even user >> error the user cannot restart the installer unless they cleanup the >> left over targets. That seems broken to me. > > I see - thinking more about that particular scenario I agree with you > installer should be able to deal with this situation in some way. > >> The fix you put in for this bug and the fix I will put in for 533 >> will stop the orchestrator from continuing on in the event of >> failures prior to starting transfer. > > Agreed. > >> >> My concern about not being able to restart the installer is that with >> the livecd environment in particular, we don't disable the installer >> icon or program after a failed installation attempt. We tell them >> that TI failed, but we don't tell them how to fix it except they can >> reboot. >> >> We have a few choices with this(IMO): >> >> 1. Generate better error messages regarding TI failure, and how to clean >> up the leftover configuration. >> 2. Disable the installer icon until the user cleans up the stuff >> causing the failures. This is likely harder to do than might seem >> obvious since how are we going to be able to track what they have >> done to clean things up. >> 3. Modify TI to recreate unconfigure any existing configuration and >> recreate the zpool/zfs datasets if doing an initial install. This >> means the installer won't fail due to this. > > I have taken a look at TI part and played for a while with the installer. > After invoking it from command line, I let him finish TI and start TM. > Then I interrupted the install process and tried to clean up the system, > so that installer could be restarted. > > So far it seems to me that "zpool destroy -f rpool" could take care of > cleaning up everything related to TI ZFS targets (root pool, datasets, > ...), > which should be sufficient for TI to allow successfully instantiate > target > again. I will file a bug on this. I am going to mark this a stopper so it will need to be fixed for the May release. I can help with this if you need. I will be assigning you as the RE as long as you are ok with that. If not, feel free to change this. > > I am not sure if there is anything we should be aware regarding > TM part - I am CCing Moinak, who might be able to provide this > information. We do need to evaluate the robustness of the transfermod for restarts. > > Also, we probably would need to think about changes done by > orchestrator - > for example I have noticed that I can't create the same user if I restart > installer - not sure if something else in the system might be affected > as well. > Ah.. that's true. Because we have already added the user. I think we need to understand what the errors could be through the installation processing. I will file a bug on the orchestrator robustness and assign myself as the RE. thanks, sarah
