Hey all,

I sent this email on the 21st, but I apparently was not subscribed to the
mailing list, so it got silently dropped.  Here it is again!

Sorry for the delay in getting to this, I had some other stuff I was
working on.  I've confirmed that the patch that fixes this issue and causes
it is 6393 zfs receive a full send as a clone.  As part of that patch, we
started sending all the holes in files again, and all the FREEOBJECTS
records in the dataset.  This results in a DRR_FREEOBJECTS record from the
last object in the dataset to a very large object, so that all objects
after that one will be freed when receiving.  This patch also contains a
fix for a bug in receive_freeobjects; prior to this, the loop in
receive_freeobjects doesn't check the return value of dmu_object_next, so
if it cannot find an object in the range provided, it will go through the
loop drr_numobjs times.  In this case, that's 36,028,797,018,870,144 times,
which understandably presents as a hang.  In addition, that loop doesn't
check for signals, so you can't ctrl-c the process either.

There are two possible solutions to this issue: First, upgrade your
receiving system to include the fix for receive_freeobjects.  Second,
upgrade your sending system to include the fix for 6536 zfs send: want a
way to disable setting of DRR_FLAG_FREERECORDS and set
zfs_send_set_freerecords_bit to B_FALSE.  If you have r151018, then you
already have that commit, so setting that tunable should make your streams
receivable again.


>> On Tue, Nov 15, 2016 at 6:24 PM, Dan McDonald <dan...@omniti.com> wrote:
>>
>>> As discussed on the OpenZFS slack, I've seen this bug affect OmniOS
>>> r151014 and r151016.  Now note that r151014 has some backported bugfixes in
>>> it from later releases, BUT I know that r151018 does not have this problem.
>>>
>>> The send-stream I have which tickles this bug is the current OmniOS
>>> bloody ZFS stream for our PXE installer "kayak":
>>>
>>>         http://omnios.omniti.com/media/r151021-20161109.zfs.bz2
>>>
>>> Here are the list of bugfixes that are in r151018, but not in r151016
>>> (and r151014):
>>>
>>>     6385 Fix unlocking order in zfs_zget
>>>     6334 Cannot unlink files when over quota
>>>     6421 Add missing multilist_destroy calls to arc_fini
>>>     6388 Failure of userland copy should return EFAULT
>>>     6414 vdev_config_sync could be simpler
>>>     6434 sa_find_sizes() may compute wrong SA header size
>>>     6051 lzc_receive: allow the caller to read the begin record
>>>     6393 zfs receive a full send as a clone
>>>     6494 ASSERT supported zio_types for file and disk vdevs
>>>     6495 Fix mutex leak in dmu_objset_find_dp
>>>     6527 Possible access beyond end of string in zpool comment
>>>     6529 Properly handle updates of variably-sized SA entries.
>>>     6529 Properly handle updates of variably-sized SA entries.
>>>     6537 Panic on zpool scrub with DEBUG kernel
>>>     6450 scrub/resilver unnecessarily traverses snapshots created after
>>> the scrub started
>>>     6536 zfs send: want a way to disable setting of DRR_FLAG_FREERECORDS
>>>     6637 replacing "dontclose" with "should_close"
>>>     6541 Pool feature-flag check defeated if "verify" is included in the
>>> dedup property value
>>>     6585 sha512, skein, and edonr have an unenforced dependency on
>>> extensible dataset
>>>     6603 zfeature_register() should verify ZFEATURE_FLAG_PER_DATASET
>>> implies SPA_FEATURE_EXTENSIBLE_DATASET
>>>     6672 arc_reclaim_thread() should use gethrtime() instead of
>>> ddi_get_lbolt()
>>>     6673 want a macro to convert seconds to nanoseconds and vice-versa
>>>     6370 ZFS send fails to transmit some holes
>>>     6681 zfs list burning lots of time in dodefault() via dsl_prop_*
>>>     6738 zfs send stream padding needs documentation
>>>     6841 Undirty freed spill blocks
>>>     6843 Make xattr dir truncate and remove in one tx
>>>     6842 Fix empty xattr dir causing lockup
>>>     6914 kernel virtual memory fragmentation leads to hang
>>>
>>> I may have missed one or two, OR I may have ones listed here that are
>>> not in r151014's backports, but one of these is likely what cures zfs recv
>>> from hanging with particularly interesting send streams like the one I have
>>> illustrated above.
>>>
>>> Dan
>>>
>>>
>>
>>
>> --
>> Paul Dagnelie
>>
>
>
>
> --
> Paul Dagnelie
>



-- 
Paul Dagnelie



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to