On 01/ 9/12 08:54 PM, Dave Miner wrote:
On 01/09/12 06:58, Jan Damborsky wrote:
On 01/ 6/12 06:26 PM, Dave Miner wrote:
On 01/06/12 04:40, Jan Damborsky wrote:
Hi Paul, Mike,

I am currently evaluating how to approach fix for following problem
you reported and commented on:

7058014 if svc-system-config creates rpool/export, it should mount it at
/export

As it's been a while since that discussion happened, let me try to start
with
summarizing the problem, then later take a look at possible solutions.

Overview
========

System configuration (config-user smf service in particular) provides
for
possibility to create initial user account. As part of that, config-user
service creates separate ZFS dataset for user's home directory.
In default case, ZFS dataset '<root_pool>/export/home/<login>' is
created with
mountpoint inherited from '<root_pool>/export/home' parent ZFS dataset.

Since all installers create<root_pool>/export and<root_pool>/export/home ZFS datasets during installation process (utilizing Target Instantiation
module)
with mountpoints set to /export and /export/home respectively, we end
up with desired '/export/home/<login>' mountpoint for home ZFS dataset.

Problem statement
=================

That said, Automated Installer (used for installation of non-global
zones)
is a little bit special in a sense it provides for complete control over
hierarchy of ZFS datasets created.
That means it's possible to end up with a system without
'<root_pool>/export' and
'<root_pool>/export/home' datasets created during installation. Such
configuration
is accomplished via omitting appropriate entries in target section of
customized
AI manifest.

In such case, '<root_pool>/export' and '<root_pool>/export/home'
datasets
are later created by config-user service along with home ZFS dataset as
a side
effect of calling 'zfs create' with '-p' option which forces creating
all
non-existent parent ZFS datasets. The problem is that those datasets are
mounted on mountpoints inherited from parent dataset (<root_pool>  ZFS
dataset
in this case), so we end up with following structure:

dataset:mountpoint
------------------
<root_pool>/export:/<root_pool>/export
<root_pool>/export/home:/<root_pool>/export/home
<root_pool>/export/home/<login>:/<root_pool>/export/home/<login>

Which is what user currently neither expects nor desires.

Solution A
==========

If my understanding is correct you propose to address that in
config-user smf
service by explicit setting desired mountpoints for all parents created.

To be honest I am not quite convinced that's solution which fits the
existing model, as sysconfig should not explicitly manipulate datasets
which
are out its scope (parent datasets). It's goal of Target Instantiation
module
to handle that task and spreading that logic across several places would
be confusing as well as it does not sound as a good principle in
general.

Another issue I can see with this is that those datasets are explicitly
configured in default AI manifests. If user intentionally omits those
entries
in customized AI manifest, I believe we should honor that and not
implicitly
create those datasets despite user's intent.

Based on that, I propose following alternative.

Solution B
==========

If config-user is asked to create ZFS home dataset and its parents are
missing, treat
that as a fatal error. In such case, let config-user smf service inform
user on console
about that and let the service enter maintenance mode.
The reasoning behind this is that such situation would be result of a
misconfiguration
on user's side, in particular that there seems to be a requirement to
create ZFS dataset
in ZFS hierarchy not compliant with the one explicitly expressed via AI
manifest.
I believe we shouldn't try to remedy such state, as we can't assure the
result would
be compliant with user's intent. Instead, we should let user know that
invalid configuration
was supplied.

Please let me know if that may be a reasonable alternative or if I am
missing
other aspects of this problem which should be taken into account when
looking
for a solution of this problem.


I'm not sure this proposal addresses how the user would recover from
and correct the invalid input.  Can you walk through that?

Let me elaborate more on that, as I agree I missed that part.

In accordance with current design, if config-user ends up in maintenance
mode
as a result of fatal failure, user is provided with sulogin prompt.

In such situation, user is recommended (on console) to login and observe
smf log file.

In our particular case, I think we could populate smf log file with more
information
about error as well as instructions how to proceed. I am wondering if
saying something
like following may do the job:

"Service failed to create home directory ZFS dataset for initial user,
likely
because one or more parent ZFS dataset is missing (was not created
during installation).
Reinstall the system with ai_manifest(4) specifying appropriate entries
for all
non-existent parent datasets (see
/usr/share/auto-install/manifest/default.xml
for example).
Alternatively, to recover from the failure, create parent dataset(s)
manually on command
line and reboot the system."


I would expect the user should just be able to clear the config-user service without having to go through a reboot cycle.

Thank you for pointing this out, Dave.

I agree that in such case, it should be possible just to 'clear' the service in order to recover. I have verified that in fact it's already the case, i.e. 'svcadm clear config-user' works, reboot
is not needed. It's been reflected in the updated error message below.


A third possible fix is, of course, to modify the home directory setting in the profile to use a dataset that's subordinate to a dataset that has been created, then clear config-user (probably requires a re-run of manifest-import, too).

Yep, though it would be more complex comparing to the previous solution
(would require manifest-import step) and I am not sure if the result would
be compliant with the original user's intent.

From what I understand (Mike may correct me), the original problem happened during transition phase when AI switched from implicit creation of shared ZFS datasets
to the explicit one (and appropriate entries were added to AI manifest).
The problem was caused by the fact that old AI manifest was used as a template
to install system with new AI.
In that particular scenario, the desired result was to end up with a system
with those shared ZFS datasets created and I have been assuming that this is
what we should aim for in proposed recovery solutions.



Yet another possibility seems to be that we could, in this situation, not create a separate dataset at all, and merely mkdir -p the home directory. Yes, this would be a significantly different behavior. The attraction is that it seems to get the system up and running, though perhaps not ideally.

I can see this would be the least intrusive solution (no need for user's intervention),
but to be honest, I am not quite convinced that would be desirable approach.

The fact that we end up in that error situation may be a sign that there is likely something wrong with user's AI configuration and I believe we should make user aware of this, so that one can repair the cause of the problem before "malformed"
configuration is used to deploy more systems.
Also, I think that the result of this solution would be a system neither in optimal nor in 'supported' state which I think is not acceptable for enterprise scenario where AI is used. Such systems may live for a while, thus I think that form long term point of view, it would be better to deploy them as expected at the beginning.


I think it would be most unfortunate if the suggested solution is to re-install, since that's 10-15 minutes at best, more likely quite a bit longer.

I think it depends on number of systems affected. If more systems
ended up in such state as a result of using inappropriate AI configuration,
then it might be faster to repair that configuration (done once) and restart
installations rather than manually repair all of them.

That said, I can see it would make sense to first propose 'local' solution,
so I am wondering if you think rewording the error message in following way
may better fit the intent:

"Service failed to create home directory ZFS dataset for initial user,
likely because one or more parent ZFS dataset is missing (was not created
during installation).
To recover from the failure, create parent dataset(s) manually on command
line and clear the service using 'svcadm clear config-user'."
Alternatively, reinstall affected system(s) using ai_manifest(4) specifying
appropriate entries for all non-existent parent datasets (refer to
/usr/share/auto-install/manifest/default.xml as an example)."

Jan

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to