Hi Jan.
Hi everyone.
Currently AI doesn't copy the logfile if the install fails
because of the DDU or install errors. How come? I would like
to fix this. Mary also filed bug 16088 against Driver-Update on
this very issue.
But I wonder why AI wasn't copying the logfile on errors even
before Driver-Update came along. All I can come up with is:
- if /a wasn't mounted, the copy could generate additional
errors. (Seems like a small issue to me.)
Th original idea behind this behavior was that if the
installation fails,
then we should take the shortest path to abort in order to leave
the system
as untouched as possible for the inspection.
OK...
Also target BE is left mounted on /a in this case (assuming the
failure happened
after BE was created and mounted).
Sure, and it is important that this functionality remains.
In that case, we wanted to avoid cascade of error messages not
related
to the failure itself - they would be confusing and could mask
the real
problem - as an recent example, see
https://defect.opensolaris.org/bz/show_bug.cgi?id=11500#c8
OK. I suspected this, but wanted to make sure there wasn't some
other reason as well...
Does anyone have a good reason why I shouldn't attempt to copy
the logfile to /a whether or not any install errors occurred?
I can see that we could make this step more robust by checking for
presence of /a/var/sadm/system/logs directory first and copy log
files
only if it exists.
Yes, I was thinking along these lines as well.
Looking at existing ls_transfer() code [1], in case target
directory does not exist,
ls_transfer() calls mkdirp(3GEN) to create it. We could either go
with such
approach or change the behavior as described above. That would
work, since
/var/sadm/system/logs/ directory is currently being delivered by
pkg://opensolaris.org/service/management/sysidtool package.
What I thought to be best would be to check to see if /a existed
and if it did, then to transfer the log.
I moved the ls_transfer() call to above the test/exit for failure,
and now check for INSTALLED_ROOT_DIR before calling ls_transfer().
See attached.
These changes seem reasonable.
However, when I tested this I realized that there is an ICT called
earlier which does the transfer as well, so the log is actually
transfered twice! So even when I remove the log file move from
auto_install.c, it is still done and still gives errors when the
ICT does it. I need to investigate where the ICT is coming from to
know whether or not I can remove it as part of my fix. (If it is
code that is common to other installer components, I may be better
off leaving it alone.)
Looking at ict_transfer_logs() ICT task
http://src.opensolaris.org/source/xref/caiman/slim_source/usr/src/lib/libict/ict.c#ict_transfer_logs
in case of AI, it does not call ls_transfer(), but instead transfers
additional log
files to the target:
/var/svc/log/application-auto-installer:default.log
/var/svc/log/application-manifest-locator:default.log
/var/adm/messages
/tmp/ai_combined_manifest.xml
OK. I quickly saw "Failed to transfer install log file" and didn't
realize it was a more general error that came after the four files
above were not transfered. I thought it was the install_log. Sorry
about that.
No problem at all.
What kind of error you see in case DDU fails ?
- If the install completes but the DDU fails afterward, all log files
get copied, and the last thing emitted by auto-install is:
Basic installation was successful. However, there was an error
installing at least one additional driver package on target.
Please verify that all driver packages required for reboot are
installed before rebooting.
- If the install does not complete successfully, the DDU isn't run
afterward. If the failure includes not creating the directories for
the above logfiles, we see errors copying those logfiles. The
install_log doesn't get copied if /a doesn't exist.
If install fails, then I think we should not end up trying
to transfer those log files modulo known scenario tracked by bug
OK.
15454 pkg install failure in im_pop did not abort DC and AI
in which case AI does not abort if there is a failure during
transfer mode.
Is this what might cause the failure you encountered ?
Yes, well sort of... I caused a failure on purpose by removing
babel-install and SUNWcsd from the manifest. I saw lots of pkg errors
as I expected, but then the ICTs ran to copy the log files afterward. I
suspect that the directory the logfiles are copied to were part of the
packages or incorporations I didn't install.
But to answer your question, a failed pkg install attempt should have
halted the process before the ICTs tried to copy the logfiles. The
right things should happen here when 15454 is fixed.
Seems like the right things will happen even if im_pop succeeds but some
subsequent finalizer script fails. The install should stop before the
ICTs would attempt to copy the logfiles.
So it sounds like what I was planning on doing, checking for existence
of /a before copying /tmp/install_log, is enough. If /a exists,
ls_transfer will create the directories to copy the logfile. If it
doesn't, nothing will happen and no additional errors will be emitted;
in this case AI will have failed, the system will not be rebooted, and
one can inspect the install_log in /tmp.
Do you agree?
Seems like the ICT should also not try to copy its logfiles if their
target directory doesn't exist.
To be honest, I was assuming that we can't end up there modulo the bug
above -
this is the reason why I am trying understand what lead to that failure.
The copy attempts introduce the kind of error message noise that we
were trying to avoid.
One possibility I could think about how we might end up with noise
messages
emitted by ict_transfer_logs() is bug
11500 "auto-install used with non-default -p (manifest) option causes
ict_transfer_logs errors.
But this is slightly different problem - since some of source files
might not
exist, ict_transfer_logs() should be more tolerant and check for
existence
of log files before it tries to copy them.
However, I am not sure if that bug might be triggered by some
of DDU scenarios.
This shouldn't be triggered by DDU scenarios, but I introduced a
deliberate failure which played into the 11500 scenario.
Thanks,
Jack
Thank you,
Jan
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss