Hi Sarah,

Sarah Jelinek wrote:
> Hi Jan,
>
>
>> Hi Sarah,
>>
>>
>> Sarah Jelinek wrote:
>>> Hi Jan,
>>>
>>>
>>>
>>>> Hi Sarah,
>>>>
>>>> when I was doing initial investigation, it seemed to me that
>>>> at the moment useradd fails the system is left in inconsistent
>>>> state because of steps which happened before useradd was invoked.
>>>> This is the reason, why I thought we would need to clean up
>>>> the target before installation is restarted after useradd failure.
>>>>
>>>> But then I found out that target instantiation (TI) and transfer (TM)
>>>> phases are actually started after useradd finishes and that the
>>>> inconsistency was caused by the fact that orchestrator proceeded
>>>> with installation even if useradd failed.
>>>>
>>>> It means that if orchestrator returns with failure immediately
>>>> when it finds out that useradd wasn't successful and doesn't start
>>>> TI nor TM, no actual changes are done to the target, so no cleanup
>>>> needs to be done and install can be restarted successfully.
>>>>
>>>> So this fix doesn't address the case when target is in inconsistent
>>>> state before installer is invoked - for example if it would fail
>>>> for some reason during TI or TM phase and thus target would be
>>>> already instantiated and some bits already transfered.
>>>>
>>>> There is still possibility (not addressed by this bug) that
>>>> for some reason installer crashes during TI or TM and then
>>>> the system might be left in state which would prevent user
>>>> from restarting installer successfully - the question might
>>>> be if there are valid scenarios when restarting installer
>>>> would help to solve the underlying issue in these cases.
>>>> Do you think that these possibilities should be more
>>>> investigated ?
>>>>   
>>> I ran in to these cases myself while debugging and fixing bug 533. 
>>> Basically if an install fails in the middle for some reason even user 
>>> error the user cannot restart the installer unless they cleanup the 
>>> left over targets. That seems broken to me.
>> I see - thinking more about that particular scenario I agree with you
>> installer should be able to deal with this situation in some way.
>>
>>> The fix you put in for this bug and the fix I will put in for 533 
>>> will stop the orchestrator from continuing on in the event of 
>>> failures prior to starting transfer.
>> Agreed.
>>
>>> My concern about not being able to restart the installer is that with 
>>> the livecd environment in particular,  we don't disable the installer 
>>> icon or program after a failed installation attempt. We tell them 
>>> that TI failed, but we don't tell them how to fix it except they can 
>>> reboot.
>>>
>>> We have a few choices with this(IMO):
>>>
>>> 1. Generate better error messages regarding TI failure, and how to clean
>>> up the leftover configuration.
>>> 2. Disable the installer icon until the user cleans up the stuff 
>>> causing the failures. This is likely harder to do than might seem 
>>> obvious since how are we going to be able to track what they have 
>>> done to clean things up.
>>> 3. Modify TI to recreate unconfigure any existing configuration and 
>>> recreate the zpool/zfs datasets if doing an initial install. This 
>>> means the installer won't fail due to this.
>> I have taken a look at TI part and played for a while with the installer.
>> After invoking it from command line, I let him finish TI and start TM.
>> Then I interrupted the install process and tried to clean up the system,
>> so that installer could be restarted.
>>
>> So far it seems to me that "zpool destroy -f rpool" could take care of
>> cleaning up everything related to TI ZFS targets (root pool, datasets, 
>> ...),
>> which should be sufficient for TI to allow successfully instantiate 
>> target
>> again.
> I will file a bug on this. I am going to mark this a stopper so it will 
> need to be fixed for the May release.  I can help with this if you need. 
> I will be assigning you as the RE as long as you are ok with that. If 
> not, feel free to change this.

I have just accepted the bug. So far it seems to me that TI cleaning up
might be straightforward - but I am going to do deeper investigation.
Thank you for offering me the help !

>> I am not sure if there is anything we should be aware regarding
>> TM part - I am CCing Moinak, who might be able to provide this
>> information.
> We do need to evaluate the robustness of the transfermod for restarts.

Agreed - I will follow up with Moinak on this.

>> Also, we probably would need to think about changes done by 
>> orchestrator -
>> for example I have noticed that I can't create the same user if I restart
>> installer - not sure if something else in the system might be affected 
>> as well.
>>
> Ah.. that's true. Because we have already added the user. I think we 
> need to understand what the errors could be through the installation 
> processing. I will file a bug on the orchestrator robustness and assign 
> myself as the RE.

Thank you very much,
Jan

>
> thanks,
> sarah
> _______________________________________________
> caiman-discuss mailing list
> caiman-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss


Reply via email to