** Description changed:
- I have a server that has been running its data volume using ZFS in 20.04
- without any problem. The volume is using ZFS encryption and a raidz1-0
- configuration. I performed a scrub operations before the upgrade and it
- did not find any problem. After the reboot for the upgrade, I was
- welcomed with the following message:
+ [ Impact ]
+ Upgrading from 20.04 to 22.04 causes encrypted pools to become unmountable.
This
+ is due to broken accounting metadata causing checksum errors on decrypt, which
+ makes ZFS error out early with ECKSUM.
+
+ [ Test Plan ]
+ This issue needs specific accounting metadata on the zpool to be broken, and
as
+ such is somewhat tricky to reproduce organically. A regular test plan for an
+ affected pool should be:
+ 1. Setup encrypted zpool under 20.04
+ 2. Upgrade system to 22.04 (e.g. using do-release-upgrade script)
+ 3. Verify that zpool fails to mount under 22.04 (zpool status will likely
point
+ to ZFS-8000-8A "Corrupted data" [0])
+
+ [0] https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A/
+
+ Thankfully, upstream has included a test scenario for this under the ZFS test
+ suite, which is ran during build. The
+ tests/zfs-tests/tests/functional/userquota/13709_reproducer.bz2 file is taken
+ directly from upstream, and corresponds to an encrypted zpool with the
required
+ (broken) metadata to reproduce this issue. If the ZFS test suite passes, this
+ should give us a strong signal that this isssue is fixed.
+
+ [ Where problems could occur ]
+ Although I've backported the upstream test, it'd be great to have confirmation
+ from affected users that this patch resolves the issue. Additionally, we
should
+ also perform upgrades in non-affected zpools as well as non-encrypted zpools,
to
+ ensure no regressions have been introduced.
+
+ Considering this change affects the encrypt/decrypt code paths, problems could
+ arise in creating new encrypted zpools, as well as when mounting zpools that
+ have been previously encrypted.
+
+ [ Other Info ]
+ This SRU includes a little more changes than the minimal changes mentioned in
+ the SRU policy, as I've also backported one of upstream's tests for encrypted
+ pools. This included a new test script (userspace_encrypted_13709.ksh), as
well
+ as a binary zpool dump (13709_reproducer.bz2) that I've added under
+ d/s/include-binaries.
+
+ Considering this issue causes zpools to become unmountable, I think it's worth
+ to include these in the standard ZFS test suite (similar to an autopkgtest
+ scenario for a high-risk regression). These are included in future releases of
+ zfs-linux, and as such only Jammy is affected by this regression.
+ --
+
+ [ Original Description ]
+ I have a server that has been running its data volume using ZFS in 20.04
without any problem. The volume is using ZFS encryption and a raidz1-0
configuration. I performed a scrub operations before the upgrade and it did not
find any problem. After the reboot for the upgrade, I was welcomed with the
following message:
status: One or more devices has experienced an error resulting in data
- corruption. Applications may be affected.
+ corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
- entire pool from backup.
- see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
+ entire pool from backup.
+ see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
The volumes still do not have any checksum error but there are 5 zvols
that are not accessible. zpool status displays a line similar to the
below for each of the five:
- errors: Permanent errors have been detected in the following files:
-
- tank/data/data:<0x0>
+ errors: Permanent errors have been detected in the following files:
+
+ tank/data/data:<0x0>
I run a scrub and it has not identified any problem but the error
messages are not there and the data is still not available. There are
10+ other zvols in the zpool that do not have any kind of problem. I
have been unable to identify any correlation between the zvols that are
failing.
I have seen people reporting similar problems in github after the 20.04
to the 22.04 upgrade (see https://github.com/openzfs/zfs/issues/13763).
I wonder how widespread the problem will be as more people upgrades to
22.04.
I will try to downgrade the version of zfs in the system and report back
ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: zfsutils-linux 2.1.4-0ubuntu0.1
ProcVersionSignature: Ubuntu 5.15.0-46.49-generic 5.15.39
Uname: Linux 5.15.0-46-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
CasperMD5CheckResult: unknown
Date: Sat Aug 20 22:24:54 2022
ProcEnviron:
- TERM=screen-256color
- PATH=(custom, no user)
- XDG_RUNTIME_DIR=<set>
- LANG=en_US.UTF-8
- SHELL=/bin/bash
+ TERM=screen-256color
+ PATH=(custom, no user)
+ XDG_RUNTIME_DIR=<set>
+ LANG=en_US.UTF-8
+ SHELL=/bin/bash
SourcePackage: zfs-linux
UpgradeStatus: Upgraded to jammy on 2022-08-20 (0 days ago)
modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission
denied: '/etc/sudoers.d/zfs']
** Also affects: zfs-linux (Ubuntu Jammy)
Importance: Undecided
Status: New
** Changed in: zfs-linux (Ubuntu Jammy)
Assignee: (unassigned) => Heitor Alves de Siqueira (halves)
** Changed in: zfs-linux (Ubuntu Jammy)
Importance: Undecided => High
** Changed in: zfs-linux (Ubuntu Jammy)
Status: New => Incomplete
** Changed in: zfs-linux (Ubuntu Jammy)
Status: Incomplete => In Progress
** Changed in: zfs-linux (Ubuntu)
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1987190
Title:
ZFS unrecoverable error after upgrading from 20.04 to 22.04.1
Status in zfs-linux package in Ubuntu:
Fix Released
Status in zfs-linux source package in Jammy:
In Progress
Bug description:
[ Impact ]
Upgrading from 20.04 to 22.04 causes encrypted pools to become unmountable.
This
is due to broken accounting metadata causing checksum errors on decrypt, which
makes ZFS error out early with ECKSUM.
[ Test Plan ]
This issue needs specific accounting metadata on the zpool to be broken, and
as
such is somewhat tricky to reproduce organically. A regular test plan for an
affected pool should be:
1. Setup encrypted zpool under 20.04
2. Upgrade system to 22.04 (e.g. using do-release-upgrade script)
3. Verify that zpool fails to mount under 22.04 (zpool status will likely
point
to ZFS-8000-8A "Corrupted data" [0])
[0] https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A/
Thankfully, upstream has included a test scenario for this under the ZFS test
suite, which is ran during build. The
tests/zfs-tests/tests/functional/userquota/13709_reproducer.bz2 file is taken
directly from upstream, and corresponds to an encrypted zpool with the
required
(broken) metadata to reproduce this issue. If the ZFS test suite passes, this
should give us a strong signal that this isssue is fixed.
[ Where problems could occur ]
Although I've backported the upstream test, it'd be great to have confirmation
from affected users that this patch resolves the issue. Additionally, we
should
also perform upgrades in non-affected zpools as well as non-encrypted zpools,
to
ensure no regressions have been introduced.
Considering this change affects the encrypt/decrypt code paths, problems could
arise in creating new encrypted zpools, as well as when mounting zpools that
have been previously encrypted.
[ Other Info ]
This SRU includes a little more changes than the minimal changes mentioned in
the SRU policy, as I've also backported one of upstream's tests for encrypted
pools. This included a new test script (userspace_encrypted_13709.ksh), as
well
as a binary zpool dump (13709_reproducer.bz2) that I've added under
d/s/include-binaries.
Considering this issue causes zpools to become unmountable, I think it's worth
to include these in the standard ZFS test suite (similar to an autopkgtest
scenario for a high-risk regression). These are included in future releases of
zfs-linux, and as such only Jammy is affected by this regression.
--
[ Original Description ]
I have a server that has been running its data volume using ZFS in 20.04
without any problem. The volume is using ZFS encryption and a raidz1-0
configuration. I performed a scrub operations before the upgrade and it did not
find any problem. After the reboot for the upgrade, I was welcomed with the
following message:
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
The volumes still do not have any checksum error but there are 5 zvols
that are not accessible. zpool status displays a line similar to the
below for each of the five:
errors: Permanent errors have been detected in the following files:
tank/data/data:<0x0>
I run a scrub and it has not identified any problem but the error
messages are not there and the data is still not available. There are
10+ other zvols in the zpool that do not have any kind of problem. I
have been unable to identify any correlation between the zvols that
are failing.
I have seen people reporting similar problems in github after the
20.04 to the 22.04 upgrade (see
https://github.com/openzfs/zfs/issues/13763). I wonder how widespread
the problem will be as more people upgrades to 22.04.
I will try to downgrade the version of zfs in the system and report
back
ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: zfsutils-linux 2.1.4-0ubuntu0.1
ProcVersionSignature: Ubuntu 5.15.0-46.49-generic 5.15.39
Uname: Linux 5.15.0-46-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
CasperMD5CheckResult: unknown
Date: Sat Aug 20 22:24:54 2022
ProcEnviron:
TERM=screen-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: zfs-linux
UpgradeStatus: Upgraded to jammy on 2022-08-20 (0 days ago)
modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission
denied: '/etc/sudoers.d/zfs']
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1987190/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp