Hi Jan.
Hi everyone.

Currently AI doesn't copy the logfile if the install fails because of the DDU or install errors. How come? I would like to fix this. Mary also filed bug 16088 against Driver-Update on this very issue.

But I wonder why AI wasn't copying the logfile on errors even before Driver-Update came along. All I can come up with is:

- if /a wasn't mounted, the copy could generate additional errors. (Seems like a small issue to me.)

Th original idea behind this behavior was that if the installation fails, then we should take the shortest path to abort in order to leave the system
as untouched as possible for the inspection.
OK...
Also target BE is left mounted on /a in this case (assuming the failure happened
after BE was created and mounted).
Sure, and it is important that this functionality remains.

In that case, we wanted to avoid cascade of error messages not related to the failure itself - they would be confusing and could mask the real
problem - as an recent example, see
https://defect.opensolaris.org/bz/show_bug.cgi?id=11500#c8
OK. I suspected this, but wanted to make sure there wasn't some other reason as well...
Does anyone have a good reason why I shouldn't attempt to copy the logfile to /a whether or not any install errors occurred?

I can see that we could make this step more robust by checking for
presence of /a/var/sadm/system/logs directory first and copy log files
only if it exists.
Yes, I was thinking along these lines as well.

Looking at existing ls_transfer() code [1], in case target directory does not exist, ls_transfer() calls mkdirp(3GEN) to create it. We could either go with such approach or change the behavior as described above. That would work, since
/var/sadm/system/logs/ directory is currently being delivered by
pkg://opensolaris.org/service/management/sysidtool package.
What I thought to be best would be to check to see if /a existed and if it did, then to transfer the log.

I moved the ls_transfer() call to above the test/exit for failure, and now check for INSTALLED_ROOT_DIR before calling ls_transfer(). See attached.

These changes seem reasonable.


However, when I tested this I realized that there is an ICT called earlier which does the transfer as well, so the log is actually transfered twice! So even when I remove the log file move from auto_install.c, it is still done and still gives errors when the ICT does it. I need to investigate where the ICT is coming from to know whether or not I can remove it as part of my fix. (If it is code that is common to other installer components, I may be better off leaving it alone.)

Looking at ict_transfer_logs() ICT task

http://src.opensolaris.org/source/xref/caiman/slim_source/usr/src/lib/libict/ict.c#ict_transfer_logs

in case of AI, it does not call ls_transfer(), but instead transfers additional log
files to the target:

/var/svc/log/application-auto-installer:default.log
/var/svc/log/application-manifest-locator:default.log
/var/adm/messages
/tmp/ai_combined_manifest.xml
OK. I quickly saw "Failed to transfer install log file" and didn't realize it was a more general error that came after the four files above were not transfered. I thought it was the install_log. Sorry about that.

No problem at all.

What kind of error you see in case DDU fails ?
- If the install completes but the DDU fails afterward, all log files get copied, and the last thing emitted by auto-install is:

Basic installation was successful.  However, there was an error
installing at least one additional driver package on target.
Please verify that all driver packages required for reboot are installed before rebooting.

- If the install does not complete successfully, the DDU isn't run afterward. If the failure includes not creating the directories for the above logfiles, we see errors copying those logfiles. The install_log doesn't get copied if /a doesn't exist.

If install fails, then I think we should not end up trying
to transfer those log files modulo known scenario tracked by bug
OK.

15454 pkg install failure in im_pop did not abort DC and AI

in which case AI does not abort if there is a failure during
transfer mode.

Is this what might cause the failure you encountered ?
Yes, well sort of... I caused a failure on purpose by removing babel-install and SUNWcsd from the manifest. I saw lots of pkg errors as I expected, but then the ICTs ran to copy the log files afterward. I suspect that the directory the logfiles are copied to were part of the packages or incorporations I didn't install.

But to answer your question, a failed pkg install attempt should have halted the process before the ICTs tried to copy the logfiles. The right things should happen here when 15454 is fixed.

Seems like the right things will happen even if im_pop succeeds but some subsequent finalizer script fails. The install should stop before the ICTs would attempt to copy the logfiles.

So it sounds like what I was planning on doing, checking for existence of /a before copying /tmp/install_log, is enough. If /a exists, ls_transfer will create the directories to copy the logfile. If it doesn't, nothing will happen and no additional errors will be emitted; in this case AI will have failed, the system will not be rebooted, and one can inspect the install_log in /tmp.

Do you agree?

Seems like the ICT should also not try to copy its logfiles if their target directory doesn't exist.

To be honest, I was assuming that we can't end up there modulo the bug above -
this is the reason why I am trying understand what lead to that failure.


The copy attempts introduce the kind of error message noise that we were trying to avoid.

One possibility I could think about how we might end up with noise messages
emitted by ict_transfer_logs() is bug

11500 "auto-install used with non-default -p (manifest) option causes ict_transfer_logs errors.

But this is slightly different problem - since some of source files might not exist, ict_transfer_logs() should be more tolerant and check for existence
of log files before it tries to copy them.

However, I am not sure if that bug might be triggered by some
of DDU scenarios.
This shouldn't be triggered by DDU scenarios, but I introduced a deliberate failure which played into the 11500 scenario.

    Thanks,
    Jack

Thank you,
Jan


_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to