Hi Jan,

> Hi Sarah,
>
>
> Sarah Jelinek wrote:
>> Hi Jan,
>>
>>
>>
>>> Hi Sarah,
>>>
>>> when I was doing initial investigation, it seemed to me that
>>> at the moment useradd fails the system is left in inconsistent
>>> state because of steps which happened before useradd was invoked.
>>> This is the reason, why I thought we would need to clean up
>>> the target before installation is restarted after useradd failure.
>>>
>>> But then I found out that target instantiation (TI) and transfer (TM)
>>> phases are actually started after useradd finishes and that the
>>> inconsistency was caused by the fact that orchestrator proceeded
>>> with installation even if useradd failed.
>>>
>>> It means that if orchestrator returns with failure immediately
>>> when it finds out that useradd wasn't successful and doesn't start
>>> TI nor TM, no actual changes are done to the target, so no cleanup
>>> needs to be done and install can be restarted successfully.
>>>
>>> So this fix doesn't address the case when target is in inconsistent
>>> state before installer is invoked - for example if it would fail
>>> for some reason during TI or TM phase and thus target would be
>>> already instantiated and some bits already transfered.
>>>
>>> There is still possibility (not addressed by this bug) that
>>> for some reason installer crashes during TI or TM and then
>>> the system might be left in state which would prevent user
>>> from restarting installer successfully - the question might
>>> be if there are valid scenarios when restarting installer
>>> would help to solve the underlying issue in these cases.
>>> Do you think that these possibilities should be more
>>> investigated ?
>>>   
>> I ran in to these cases myself while debugging and fixing bug 533. 
>> Basically if an install fails in the middle for some reason even user 
>> error the user cannot restart the installer unless they cleanup the 
>> left over targets. That seems broken to me.
>
> I see - thinking more about that particular scenario I agree with you
> installer should be able to deal with this situation in some way.
>
>> The fix you put in for this bug and the fix I will put in for 533 
>> will stop the orchestrator from continuing on in the event of 
>> failures prior to starting transfer.
>
> Agreed.
>
>>
>> My concern about not being able to restart the installer is that with 
>> the livecd environment in particular,  we don't disable the installer 
>> icon or program after a failed installation attempt. We tell them 
>> that TI failed, but we don't tell them how to fix it except they can 
>> reboot.
>>
>> We have a few choices with this(IMO):
>>
>> 1. Generate better error messages regarding TI failure, and how to clean
>> up the leftover configuration.
>> 2. Disable the installer icon until the user cleans up the stuff 
>> causing the failures. This is likely harder to do than might seem 
>> obvious since how are we going to be able to track what they have 
>> done to clean things up.
>> 3. Modify TI to recreate unconfigure any existing configuration and 
>> recreate the zpool/zfs datasets if doing an initial install. This 
>> means the installer won't fail due to this.
>
> I have taken a look at TI part and played for a while with the installer.
> After invoking it from command line, I let him finish TI and start TM.
> Then I interrupted the install process and tried to clean up the system,
> so that installer could be restarted.
>
> So far it seems to me that "zpool destroy -f rpool" could take care of
> cleaning up everything related to TI ZFS targets (root pool, datasets, 
> ...),
> which should be sufficient for TI to allow successfully instantiate 
> target
> again.
I will file a bug on this. I am going to mark this a stopper so it will 
need to be fixed for the May release.  I can help with this if you need. 
I will be assigning you as the RE as long as you are ok with that. If 
not, feel free to change this.
>
> I am not sure if there is anything we should be aware regarding
> TM part - I am CCing Moinak, who might be able to provide this
> information.
We do need to evaluate the robustness of the transfermod for restarts.
>
> Also, we probably would need to think about changes done by 
> orchestrator -
> for example I have noticed that I can't create the same user if I restart
> installer - not sure if something else in the system might be affected 
> as well.
>
Ah.. that's true. Because we have already added the user. I think we 
need to understand what the errors could be through the installation 
processing. I will file a bug on the orchestrator robustness and assign 
myself as the RE.

thanks,
sarah

Reply via email to