[caiman-discuss] Code review request for bug 88 - installer goes nuts when useradd fails

jan damborsky Thu, 13 Mar 2008 13:03:28 +0100

Hi Sarah,


Sarah Jelinek wrote:
> Hi Jan,
>
>
>
>> Hi Sarah,
>>
>> when I was doing initial investigation, it seemed to me that
>> at the moment useradd fails the system is left in inconsistent
>> state because of steps which happened before useradd was invoked.
>> This is the reason, why I thought we would need to clean up
>> the target before installation is restarted after useradd failure.
>>
>> But then I found out that target instantiation (TI) and transfer (TM)
>> phases are actually started after useradd finishes and that the
>> inconsistency was caused by the fact that orchestrator proceeded
>> with installation even if useradd failed.
>>
>> It means that if orchestrator returns with failure immediately
>> when it finds out that useradd wasn't successful and doesn't start
>> TI nor TM, no actual changes are done to the target, so no cleanup
>> needs to be done and install can be restarted successfully.
>>
>> So this fix doesn't address the case when target is in inconsistent
>> state before installer is invoked - for example if it would fail
>> for some reason during TI or TM phase and thus target would be
>> already instantiated and some bits already transfered.
>>
>> There is still possibility (not addressed by this bug) that
>> for some reason installer crashes during TI or TM and then
>> the system might be left in state which would prevent user
>> from restarting installer successfully - the question might
>> be if there are valid scenarios when restarting installer
>> would help to solve the underlying issue in these cases.
>> Do you think that these possibilities should be more
>> investigated ?
>>   
> I ran in to these cases myself while debugging and fixing bug 533. 
> Basically if an install fails in the middle for some reason even user 
> error the user cannot restart the installer unless they cleanup the left 
> over targets. That seems broken to me.

I see - thinking more about that particular scenario I agree with you
installer should be able to deal with this situation in some way.

> The fix you put in for this bug 
> and the fix I will put in for 533 will stop the orchestrator from 
> continuing on in the event of failures prior to starting transfer.

Agreed.

>
> My concern about not being able to restart the installer is that with 
> the livecd environment in particular,  we don't disable the installer 
> icon or program after a failed installation attempt. We tell them that 
> TI failed, but we don't tell them how to fix it except they can reboot.
>
> We have a few choices with this(IMO):
>
> 1. Generate better error messages regarding TI failure, and how to clean
> up the leftover configuration.
> 2. Disable the installer icon until the user cleans up the stuff causing 
> the failures. This is likely harder to do than might seem obvious since 
> how are we going to be able to track what they have done to clean things up.
> 3. Modify TI to recreate unconfigure any existing configuration and 
> recreate the zpool/zfs datasets if doing an initial install. This means 
> the installer won't fail due to this.

I have taken a look at TI part and played for a while with the installer.
After invoking it from command line, I let him finish TI and start TM.
Then I interrupted the install process and tried to clean up the system,
so that installer could be restarted.

So far it seems to me that "zpool destroy -f rpool" could take care of
cleaning up everything related to TI ZFS targets (root pool, datasets, ...),
which should be sufficient for TI to allow successfully instantiate target
again.

I am not sure if there is anything we should be aware regarding
TM part - I am CCing Moinak, who might be able to provide this
information.

Also, we probably would need to think about changes done by orchestrator -
for example I have noticed that I can't create the same user if I restart
installer - not sure if something else in the system might be affected 
as well.

>
> I do think that we need to take a look at the robustness of both TI and 
> TM so that we can handle some unexpected situations better.
>
> The code changes for this bug are fine for putback.

Thank you for the review,
Jan

>
> thanks,
> sarah
> ****
>> Thank you very much for the review,
>> Jan
>>
>>
>> Sarah Jelinek wrote:
>>   
>>> Hi Jan,
>>>
>>> This makes sense to cleanup if the useradd doesn't succeed. but how does 
>>> this fix cleanup the other issues noted in the bug? Or is this fix 
>>> intended to do that?
>>>
>>> sarah
>>> ***
>>> jan damborsky wrote:
>>>     
>>>> Hi Sarah, Sundar,
>>>>
>>>> could I please ask you to review changes for
>>>> following bug ?
>>>>
>>>> 88 installer goes nuts when useradd fails
>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=88
>>>>
>>>> Webrev is available at
>>>> http://cr.opensolaris.org/~dambi/bug-88/
>>>>
>>>> Thank you very much,
>>>> Jan
>>>>
>>>>
>>>>
>>>>       
>>> _______________________________________________
>>> caiman-discuss mailing list
>>> caiman-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
>>>     
>> _______________________________________________
>> caiman-discuss mailing list
>> caiman-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
>>
>>   
> _______________________________________________
> caiman-discuss mailing list
> caiman-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

[caiman-discuss] Code review request for bug 88 - installer goes nuts when useradd fails

Reply via email to