Another problem that seems potentially related to
http://thread.gmane.org/gmane.comp.file-systems.openzfs.devel/2911/focus=2917
but could be somehting different.

So far reproduced only on FreeBSD.

panic: solaris assert: used > 0 ||
dsl_dir_phys(dd)->dd_used_breakdown[type] >= -used, file:
/usr/devel/svn/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c,
line: 1389
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe004db89410
vpanic() at vpanic+0x182/frame 0xfffffe004db89490
panic() at panic+0x43/frame 0xfffffe004db894f0
assfail() at assfail+0x1a/frame 0xfffffe004db89500
dsl_dir_diduse_space() at dsl_dir_diduse_space+0x200/frame
0xfffffe004db89580
dsl_dataset_clone_swap_sync_impl() at
dsl_dataset_clone_swap_sync_impl+0x3f5/frame 0xfffffe004db89690
dsl_dataset_rollback_sync() at dsl_dataset_rollback_sync+0x11d/frame
0xfffffe004db897f0
dsl_sync_task_sync() at dsl_sync_task_sync+0xef/frame 0xfffffe004db89820
dsl_pool_sync() at dsl_pool_sync+0x45b/frame 0xfffffe004db89890
spa_sync() at spa_sync+0x7c7/frame 0xfffffe004db89ad0
txg_sync_thread() at txg_sync_thread+0x383/frame 0xfffffe004db89bb0

After the panic the pool is left in a state where running the same
rollback command (zfs rollback zroot2/test/4@1) results in the same panic.

Some data from the affected datasets:

zroot2/test/4  used                  1076224                -
zroot2/test/4  referenced            51200                  -
zroot2/test/4  usedbysnapshots       1024                   -
zroot2/test/4  usedbydataset         40960                  -
zroot2/test/4  usedbychildren        1034240                -
zroot2/test/4  origin                zroot2/test/1@4        -

zroot2/test/4@1  used                  1024                   -
zroot2/test/4@1  referenced            50176                  -
zroot2/test/4@1  clones                zroot2/test/3/2/2/4    -

zroot2/test/1@4  used                  24576                  -
zroot2/test/1@4  referenced            50176                  -
zroot2/test/1@4  clones                zroot2/test/4          -

zroot2/test/1  used                  318464                 -
zroot2/test/1  referenced            50176                  -
zroot2/test/1  usedbysnapshots       25600                  -
zroot2/test/1  usedbydataset         50176                  -
zroot2/test/1  usedbychildren        242688                 -
zroot2/test/1  origin    -       -

Using a debugger I determined that zroot2/test/4 has a deadlist of size
40960.  So, in dsl_dataset_clone_swap_sync_impl() dused is calculated as:
dused = 50176 + 0 - (51200 + 40960) = -41984

This is a value that gets passed to
                dsl_dir_diduse_space(origin_head->ds_dir, DD_USED_HEAD,
                    dused, dcomp, duncomp, tx);

And dd_used_breakdown[DD_USED_HEAD] is 40960 there, so the assertion
prevents it from going to the negative territory.

I am not sure how the datasets came to this state.
I can provide any additional data that can be queried from the pool and
from the crash dump.

-- 
Andriy Gapon


-------------------------------------------
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com

Reply via email to