On 01/11/12 04:37 PM, Dave Miner wrote:
On 01/11/12 09:34, Jan Damborsky wrote:
On 01/11/2012 12:32 AM, Dave Miner wrote:
On 01/10/12 05:17, Jan Damborsky wrote:
On 01/ 9/12 08:54 PM, Dave Miner wrote:
On 01/09/12 06:58, Jan Damborsky wrote:
On 01/ 6/12 06:26 PM, Dave Miner wrote:
On 01/06/12 04:40, Jan Damborsky wrote:
Hi Paul, Mike,

I am currently evaluating how to approach fix for following problem
you reported and commented on:

7058014 if svc-system-config creates rpool/export, it should mount
it at
/export

As it's been a while since that discussion happened, let me try to
start
with
summarizing the problem, then later take a look at possible
solutions.

Overview
========

System configuration (config-user smf service in particular)
provides
for
possibility to create initial user account. As part of that,
config-user
service creates separate ZFS dataset for user's home directory.
In default case, ZFS dataset '<root_pool>/export/home/<login>' is
created with
mountpoint inherited from '<root_pool>/export/home' parent ZFS
dataset.

Since all installers create<root_pool>/export
and<root_pool>/export/home
ZFS datasets during installation process (utilizing Target
Instantiation
module)
with mountpoints set to /export and /export/home respectively, we
end
up with desired '/export/home/<login>' mountpoint for home ZFS
dataset.

Problem statement
=================

That said, Automated Installer (used for installation of non-global
zones)
is a little bit special in a sense it provides for complete control
over
hierarchy of ZFS datasets created.
That means it's possible to end up with a system without
'<root_pool>/export' and
'<root_pool>/export/home' datasets created during installation. Such
configuration
is accomplished via omitting appropriate entries in target
section of
customized
AI manifest.

In such case, '<root_pool>/export' and '<root_pool>/export/home'
datasets
are later created by config-user service along with home ZFS
dataset as
a side
effect of calling 'zfs create' with '-p' option which forces
creating
all
non-existent parent ZFS datasets. The problem is that those
datasets are
mounted on mountpoints inherited from parent dataset (<root_pool>
ZFS
dataset
in this case), so we end up with following structure:

dataset:mountpoint
------------------
<root_pool>/export:/<root_pool>/export
<root_pool>/export/home:/<root_pool>/export/home
<root_pool>/export/home/<login>:/<root_pool>/export/home/<login>

Which is what user currently neither expects nor desires.

Solution A
==========

If my understanding is correct you propose to address that in
config-user smf
service by explicit setting desired mountpoints for all parents
created.

To be honest I am not quite convinced that's solution which fits the
existing model, as sysconfig should not explicitly manipulate
datasets
which
are out its scope (parent datasets). It's goal of Target
Instantiation
module
to handle that task and spreading that logic across several places
would
be confusing as well as it does not sound as a good principle in
general.

Another issue I can see with this is that those datasets are
explicitly
configured in default AI manifests. If user intentionally omits
those
entries
in customized AI manifest, I believe we should honor that and not
implicitly
create those datasets despite user's intent.

Based on that, I propose following alternative.

Solution B
==========

If config-user is asked to create ZFS home dataset and its
parents are
missing, treat
that as a fatal error. In such case, let config-user smf service
inform
user on console
about that and let the service enter maintenance mode.
The reasoning behind this is that such situation would be result
of a
misconfiguration
on user's side, in particular that there seems to be a
requirement to
create ZFS dataset
in ZFS hierarchy not compliant with the one explicitly expressed
via AI
manifest.
I believe we shouldn't try to remedy such state, as we can't assure
the
result would
be compliant with user's intent. Instead, we should let user know
that
invalid configuration
was supplied.

Please let me know if that may be a reasonable alternative or if
I am
missing
other aspects of this problem which should be taken into account
when
looking
for a solution of this problem.


I'm not sure this proposal addresses how the user would recover from
and correct the invalid input. Can you walk through that?

Let me elaborate more on that, as I agree I missed that part.

In accordance with current design, if config-user ends up in
maintenance
mode
as a result of fatal failure, user is provided with sulogin prompt.

In such situation, user is recommended (on console) to login and
observe
smf log file.

In our particular case, I think we could populate smf log file with
more
information
about error as well as instructions how to proceed. I am wondering if
saying something
like following may do the job:

"Service failed to create home directory ZFS dataset for initial user,
likely
because one or more parent ZFS dataset is missing (was not created
during installation).
Reinstall the system with ai_manifest(4) specifying appropriate
entries
for all
non-existent parent datasets (see
/usr/share/auto-install/manifest/default.xml
for example).
Alternatively, to recover from the failure, create parent dataset(s)
manually on command
line and reboot the system."


I would expect the user should just be able to clear the config-user
service without having to go through a reboot cycle.

Thank you for pointing this out, Dave.

I agree that in such case, it should be possible just to 'clear' the
service in order to recover.
I have verified that in fact it's already the case, i.e. 'svcadm clear
config-user' works, reboot
is not needed. It's been reflected in the updated error message below.


A third possible fix is, of course, to modify the home directory
setting in the profile to use a dataset that's subordinate to a
dataset that has been created, then clear config-user (probably
requires a re-run of manifest-import, too).

Yep, though it would be more complex comparing to the previous solution
(would require manifest-import step) and I am not sure if the result
would
be compliant with the original user's intent.

From what I understand (Mike may correct me), the original problem
happened during
transition phase when AI switched from implicit creation of shared ZFS
datasets
to the explicit one (and appropriate entries were added to AI manifest).
The problem was caused by the fact that old AI manifest was used as a
template
to install system with new AI.
In that particular scenario, the desired result was to end up with a
system
with those shared ZFS datasets created and I have been assuming that
this is
what we should aim for in proposed recovery solutions.


We shouldn't be running into this with that original case anymore. The
thing that concerns me to a fair extent is that a system that's
installed with AI using a profile that has been deliberately modified
to exclude the export and export home entries but then configured with
sysconfig will now end up with a failure in configuration if there's a
user account created. What can we do to prevent a failure from
happening there? And how can we make it more obvious in the
configuration profiles and AI manifests that there is this linkage for
those that aren't using sysconfig? The mkdir solution mitigates that
failure case, but I agree it has its own problems.



Yet another possibility seems to be that we could, in this situation,
not create a separate dataset at all, and merely mkdir -p the home
directory. Yes, this would be a significantly different behavior.
The attraction is that it seems to get the system up and running,
though perhaps not ideally.

I can see this would be the least intrusive solution (no need for user's
intervention),
but to be honest, I am not quite convinced that would be desirable
approach.

The fact that we end up in that error situation may be a sign that there
is likely
something wrong with user's AI configuration and I believe we should
make user
aware of this, so that one can repair the cause of the problem before
"malformed"
configuration is used to deploy more systems.
Also, I think that the result of this solution would be a system neither
in optimal
nor in 'supported' state which I think is not acceptable for enterprise
scenario
where AI is used. Such systems may live for a while, thus I think that
form long
term point of view, it would be better to deploy them as expected at the
beginning.


I think it would be most unfortunate if the suggested solution is to
re-install, since that's 10-15 minutes at best, more likely quite a
bit longer.

I think it depends on number of systems affected. If more systems
ended up in such state as a result of using inappropriate AI
configuration,
then it might be faster to repair that configuration (done once) and
restart
installations rather than manually repair all of them.

That said, I can see it would make sense to first propose 'local'
solution,
so I am wondering if you think rewording the error message in
following way
may better fit the intent:

"Service failed to create home directory ZFS dataset for initial user,
likely because one or more parent ZFS dataset is missing (was not
created
during installation).
To recover from the failure, create parent dataset(s) manually on
command
line and clear the service using 'svcadm clear config-user'."
Alternatively, reinstall affected system(s) using ai_manifest(4)
specifying
appropriate entries for all non-existent parent datasets (refer to
/usr/share/auto-install/manifest/default.xml as an example)."


We need this error message to be definitive (not "most likely") and it
needs to prescribe the datasets that we think need to be created.

That would be no doubt also more professional :-)

Looking at existing config-user start method, determining that information should not be a big deal. The modified error message would then look like:

"Service failed to create home directory ZFS dataset for initial user,
because following parent ZFS datasets are missing (were not created
during installation):

rpool/export
rpool/export/home

To recover from the failure, create parent dataset(s) manually on command
line and clear the service using 'svcadm clear config-user'."
Alternatively, reinstall affected system(s) using ai_manifest(4) specifying
appropriate entries for those ZFS datasets (refer to
/usr/share/auto-install/manifest/default.xml as an example)."


I see Mike made the suggestion I would have, so we'll consider that OK. However, you didn't answer my query above about whether there's a way to prevent these failures altogether in the case where sysconfig is used for interactive config on first boot. Thoughts?

To be honest, I overlooked that comment. Sorry about that.

As a potential solution for interactive scenario, I can imagine that
SCI tool could be amended to navigate user through such case.
For instance in case user decided to create initial user account,
SCI tool would verify that parents of home ZFS dataset exist. If they
didn't exist, user could be either made aware of consequences (need
for subsequent manual intervention), so that one could reconsider
and not to create user account, or SCI tool could be even nicer
and provide user with option which would allow creating those dataset
automatically later by config-user smf service.
That said, as this would not be kind of low effort change (e.g. UI design
team would need to be involved), I believe we would need to have some feedback
from users first in order to determine if it's worth that effort.
As that case would affect people who removed intentionally those datasets from AI manifest, then assuming they know what they are doing, they are aware what's
going on behind the scenes, thus they may well be also aware of potential
consequences and that sort of 'disruptions' may be acceptable for them.



I think it would also be a good idea for comment in the AI manifest to note this potential problem if the user chooses to delete the export dataset there.

I agree. Also, I think that we could add couple of sentences
clarifying that point to sysconfig document section talking about creating
initial user account.




Somewhat tangential, but while I'm thinking about it, is there a
reason config-user can't just let useradd create the datasets now that
it has that functionality?

In fact, I took a quick look at that feature WRT potential fix for 7030232
and found out that what useradd currently provides does not quite fit
config-user needs. In particular, useradd does not provide for possibility
to customize home ZFS dataset (only its mountpoint), something which
config-user
supports via SC manifest. So for non-default case, config-user would still
need to go with 'zfs create'.
Also, it would not help with the problem being discussed, as it relies
on parent ZFS datasets being already created.
I am wondering if those limitations could be candidates for useradd RFE.


I think it's well worth discussing. We don't have much data to support it yet, but in general I would prefer consistency of capabilities between what's done during installation and later administrative operations.

Yep. There are likely more areas where 'config-user' implements things in its own way (e.g. copying skeleton files to home directory), I think those should be consolidated. There is already CR 7030232 which I can imagine could serve for tracking that effort. But as you pointed out, more data would be needed, for instance when determining if particular feature which config-user currently supports (and useradd does not) should
be preserved (but let useradd take care of that) or removed.

Jan

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to