On 01/13/12 07:25, Jan Damborsky wrote:
On 01/12/12 07:47 PM, Dave Miner wrote:
On 01/12/12 07:16, Jan Damborsky wrote:
On 01/11/12 04:37 PM, Dave Miner wrote:
On 01/11/12 09:34, Jan Damborsky wrote:
On 01/11/2012 12:32 AM, Dave Miner wrote:
On 01/10/12 05:17, Jan Damborsky wrote:
On 01/ 9/12 08:54 PM, Dave Miner wrote:
On 01/09/12 06:58, Jan Damborsky wrote:
On 01/ 6/12 06:26 PM, Dave Miner wrote:
On 01/06/12 04:40, Jan Damborsky wrote:
Hi Paul, Mike,
I am currently evaluating how to approach fix for following
problem
you reported and commented on:
7058014 if svc-system-config creates rpool/export, it should
mount
it at
/export
As it's been a while since that discussion happened, let me
try to
start
with
summarizing the problem, then later take a look at possible
solutions.
Overview
========
System configuration (config-user smf service in particular)
provides
for
possibility to create initial user account. As part of that,
config-user
service creates separate ZFS dataset for user's home directory.
In default case, ZFS dataset
'<root_pool>/export/home/<login>' is
created with
mountpoint inherited from '<root_pool>/export/home' parent ZFS
dataset.
Since all installers create<root_pool>/export
and<root_pool>/export/home
ZFS datasets during installation process (utilizing Target
Instantiation
module)
with mountpoints set to /export and /export/home
respectively, we
end
up with desired '/export/home/<login>' mountpoint for home ZFS
dataset.
Problem statement
=================
That said, Automated Installer (used for installation of
non-global
zones)
is a little bit special in a sense it provides for complete
control
over
hierarchy of ZFS datasets created.
That means it's possible to end up with a system without
'<root_pool>/export' and
'<root_pool>/export/home' datasets created during installation.
Such
configuration
is accomplished via omitting appropriate entries in target
section of
customized
AI manifest.
In such case, '<root_pool>/export' and '<root_pool>/export/home'
datasets
are later created by config-user service along with home ZFS
dataset as
a side
effect of calling 'zfs create' with '-p' option which forces
creating
all
non-existent parent ZFS datasets. The problem is that those
datasets are
mounted on mountpoints inherited from parent dataset
(<root_pool>
ZFS
dataset
in this case), so we end up with following structure:
dataset:mountpoint
------------------
<root_pool>/export:/<root_pool>/export
<root_pool>/export/home:/<root_pool>/export/home
<root_pool>/export/home/<login>:/<root_pool>/export/home/<login>
Which is what user currently neither expects nor desires.
Solution A
==========
If my understanding is correct you propose to address that in
config-user smf
service by explicit setting desired mountpoints for all parents
created.
To be honest I am not quite convinced that's solution which
fits the
existing model, as sysconfig should not explicitly manipulate
datasets
which
are out its scope (parent datasets). It's goal of Target
Instantiation
module
to handle that task and spreading that logic across several
places
would
be confusing as well as it does not sound as a good principle in
general.
Another issue I can see with this is that those datasets are
explicitly
configured in default AI manifests. If user intentionally omits
those
entries
in customized AI manifest, I believe we should honor that and
not
implicitly
create those datasets despite user's intent.
Based on that, I propose following alternative.
Solution B
==========
If config-user is asked to create ZFS home dataset and its
parents are
missing, treat
that as a fatal error. In such case, let config-user smf service
inform
user on console
about that and let the service enter maintenance mode.
The reasoning behind this is that such situation would be result
of a
misconfiguration
on user's side, in particular that there seems to be a
requirement to
create ZFS dataset
in ZFS hierarchy not compliant with the one explicitly expressed
via AI
manifest.
I believe we shouldn't try to remedy such state, as we can't
assure
the
result would
be compliant with user's intent. Instead, we should let user
know
that
invalid configuration
was supplied.
Please let me know if that may be a reasonable alternative or if
I am
missing
other aspects of this problem which should be taken into account
when
looking
for a solution of this problem.
I'm not sure this proposal addresses how the user would recover
from
and correct the invalid input. Can you walk through that?
Let me elaborate more on that, as I agree I missed that part.
In accordance with current design, if config-user ends up in
maintenance
mode
as a result of fatal failure, user is provided with sulogin
prompt.
In such situation, user is recommended (on console) to login and
observe
smf log file.
In our particular case, I think we could populate smf log file
with
more
information
about error as well as instructions how to proceed. I am
wondering if
saying something
like following may do the job:
"Service failed to create home directory ZFS dataset for initial
user,
likely
because one or more parent ZFS dataset is missing (was not created
during installation).
Reinstall the system with ai_manifest(4) specifying appropriate
entries
for all
non-existent parent datasets (see
/usr/share/auto-install/manifest/default.xml
for example).
Alternatively, to recover from the failure, create parent
dataset(s)
manually on command
line and reboot the system."
I would expect the user should just be able to clear the
config-user
service without having to go through a reboot cycle.
Thank you for pointing this out, Dave.
I agree that in such case, it should be possible just to 'clear' the
service in order to recover.
I have verified that in fact it's already the case, i.e. 'svcadm
clear
config-user' works, reboot
is not needed. It's been reflected in the updated error message
below.
A third possible fix is, of course, to modify the home directory
setting in the profile to use a dataset that's subordinate to a
dataset that has been created, then clear config-user (probably
requires a re-run of manifest-import, too).
Yep, though it would be more complex comparing to the previous
solution
(would require manifest-import step) and I am not sure if the result
would
be compliant with the original user's intent.
From what I understand (Mike may correct me), the original problem
happened during
transition phase when AI switched from implicit creation of
shared ZFS
datasets
to the explicit one (and appropriate entries were added to AI
manifest).
The problem was caused by the fact that old AI manifest was used
as a
template
to install system with new AI.
In that particular scenario, the desired result was to end up with a
system
with those shared ZFS datasets created and I have been assuming that
this is
what we should aim for in proposed recovery solutions.
We shouldn't be running into this with that original case anymore.
The
thing that concerns me to a fair extent is that a system that's
installed with AI using a profile that has been deliberately modified
to exclude the export and export home entries but then configured
with
sysconfig will now end up with a failure in configuration if
there's a
user account created. What can we do to prevent a failure from
happening there? And how can we make it more obvious in the
configuration profiles and AI manifests that there is this linkage
for
those that aren't using sysconfig? The mkdir solution mitigates that
failure case, but I agree it has its own problems.
Yet another possibility seems to be that we could, in this
situation,
not create a separate dataset at all, and merely mkdir -p the home
directory. Yes, this would be a significantly different behavior.
The attraction is that it seems to get the system up and running,
though perhaps not ideally.
I can see this would be the least intrusive solution (no need for
user's
intervention),
but to be honest, I am not quite convinced that would be desirable
approach.
The fact that we end up in that error situation may be a sign that
there
is likely
something wrong with user's AI configuration and I believe we should
make user
aware of this, so that one can repair the cause of the problem
before
"malformed"
configuration is used to deploy more systems.
Also, I think that the result of this solution would be a system
neither
in optimal
nor in 'supported' state which I think is not acceptable for
enterprise
scenario
where AI is used. Such systems may live for a while, thus I think
that
form long
term point of view, it would be better to deploy them as expected
at the
beginning.
I think it would be most unfortunate if the suggested solution
is to
re-install, since that's 10-15 minutes at best, more likely quite a
bit longer.
I think it depends on number of systems affected. If more systems
ended up in such state as a result of using inappropriate AI
configuration,
then it might be faster to repair that configuration (done once) and
restart
installations rather than manually repair all of them.
That said, I can see it would make sense to first propose 'local'
solution,
so I am wondering if you think rewording the error message in
following way
may better fit the intent:
"Service failed to create home directory ZFS dataset for initial
user,
likely because one or more parent ZFS dataset is missing (was not
created
during installation).
To recover from the failure, create parent dataset(s) manually on
command
line and clear the service using 'svcadm clear config-user'."
Alternatively, reinstall affected system(s) using ai_manifest(4)
specifying
appropriate entries for all non-existent parent datasets (refer to
/usr/share/auto-install/manifest/default.xml as an example)."
We need this error message to be definitive (not "most likely")
and it
needs to prescribe the datasets that we think need to be created.
That would be no doubt also more professional :-)
Looking at existing config-user start method, determining that
information
should not be a big deal. The modified error message would then look
like:
"Service failed to create home directory ZFS dataset for initial user,
because following parent ZFS datasets are missing (were not created
during installation):
rpool/export
rpool/export/home
To recover from the failure, create parent dataset(s) manually on
command
line and clear the service using 'svcadm clear config-user'."
Alternatively, reinstall affected system(s) using ai_manifest(4)
specifying
appropriate entries for those ZFS datasets (refer to
/usr/share/auto-install/manifest/default.xml as an example)."
I see Mike made the suggestion I would have, so we'll consider that
OK. However, you didn't answer my query above about whether there's a
way to prevent these failures altogether in the case where sysconfig
is used for interactive config on first boot. Thoughts?
To be honest, I overlooked that comment. Sorry about that.
As a potential solution for interactive scenario, I can imagine that
SCI tool could be amended to navigate user through such case.
For instance in case user decided to create initial user account,
SCI tool would verify that parents of home ZFS dataset exist. If they
didn't exist, user could be either made aware of consequences (need
for subsequent manual intervention), so that one could reconsider
and not to create user account, or SCI tool could be even nicer
and provide user with option which would allow creating those dataset
automatically later by config-user smf service.
That said, as this would not be kind of low effort change (e.g. UI
design
team would need to be involved), I believe we would need to have some
feedback
from users first in order to determine if it's worth that effort.
As that case would affect people who removed intentionally those
datasets from
AI manifest, then assuming they know what they are doing, they are aware
what's
going on behind the scenes, thus they may well be also aware of
potential
consequences and that sort of 'disruptions' may be acceptable for them.
I'm a little less confident that users know what they are doing :-)
Yeah, you are likely closer to the real state of things
and was likely wishful thinking on my side :-)
Improving the commenting in the manifest would make it more likely
that they would not remove those datasets without realizing the
consequences. However, there's also the possibility that in some,
perhaps many, environments, the person who sets up the manifests won't
necessarily be the same person who operates and configures the system
later, especially if sysconfig configure is run later on to
reconfigure the system.
This is a good point. I didn't think about that aspect.
Another solution that occurs to me is that the initial user account
creation could perhaps be disabled in sysconfig when it detects such a
condition. I'm not sure I like it, necessarily, but it also prevents
the system ending up in maintenance mode from this situation, which
seems important to me.
Comparing with other alternatives, I think this would be the best
trade off for now. In addition to that, related help screen could be
amended to clarify that behavior.
Unless we work out another solution, if you agree, I would proceed with
this approach. I think filing separate CR would be perhaps appropriate
as these
changes would end up if install gate (while config-user ones will
go into ON one).
I don't have a better idea, but I'd like Drew to agree with this, too.
Dave
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss