Hey all, I sent this email on the 21st, but I apparently was not subscribed to the mailing list, so it got silently dropped. Here it is again!
Sorry for the delay in getting to this, I had some other stuff I was working on. I've confirmed that the patch that fixes this issue and causes it is 6393 zfs receive a full send as a clone. As part of that patch, we started sending all the holes in files again, and all the FREEOBJECTS records in the dataset. This results in a DRR_FREEOBJECTS record from the last object in the dataset to a very large object, so that all objects after that one will be freed when receiving. This patch also contains a fix for a bug in receive_freeobjects; prior to this, the loop in receive_freeobjects doesn't check the return value of dmu_object_next, so if it cannot find an object in the range provided, it will go through the loop drr_numobjs times. In this case, that's 36,028,797,018,870,144 times, which understandably presents as a hang. In addition, that loop doesn't check for signals, so you can't ctrl-c the process either. There are two possible solutions to this issue: First, upgrade your receiving system to include the fix for receive_freeobjects. Second, upgrade your sending system to include the fix for 6536 zfs send: want a way to disable setting of DRR_FLAG_FREERECORDS and set zfs_send_set_freerecords_bit to B_FALSE. If you have r151018, then you already have that commit, so setting that tunable should make your streams receivable again. >> On Tue, Nov 15, 2016 at 6:24 PM, Dan McDonald <dan...@omniti.com> wrote: >> >>> As discussed on the OpenZFS slack, I've seen this bug affect OmniOS >>> r151014 and r151016. Now note that r151014 has some backported bugfixes in >>> it from later releases, BUT I know that r151018 does not have this problem. >>> >>> The send-stream I have which tickles this bug is the current OmniOS >>> bloody ZFS stream for our PXE installer "kayak": >>> >>> http://omnios.omniti.com/media/r151021-20161109.zfs.bz2 >>> >>> Here are the list of bugfixes that are in r151018, but not in r151016 >>> (and r151014): >>> >>> 6385 Fix unlocking order in zfs_zget >>> 6334 Cannot unlink files when over quota >>> 6421 Add missing multilist_destroy calls to arc_fini >>> 6388 Failure of userland copy should return EFAULT >>> 6414 vdev_config_sync could be simpler >>> 6434 sa_find_sizes() may compute wrong SA header size >>> 6051 lzc_receive: allow the caller to read the begin record >>> 6393 zfs receive a full send as a clone >>> 6494 ASSERT supported zio_types for file and disk vdevs >>> 6495 Fix mutex leak in dmu_objset_find_dp >>> 6527 Possible access beyond end of string in zpool comment >>> 6529 Properly handle updates of variably-sized SA entries. >>> 6529 Properly handle updates of variably-sized SA entries. >>> 6537 Panic on zpool scrub with DEBUG kernel >>> 6450 scrub/resilver unnecessarily traverses snapshots created after >>> the scrub started >>> 6536 zfs send: want a way to disable setting of DRR_FLAG_FREERECORDS >>> 6637 replacing "dontclose" with "should_close" >>> 6541 Pool feature-flag check defeated if "verify" is included in the >>> dedup property value >>> 6585 sha512, skein, and edonr have an unenforced dependency on >>> extensible dataset >>> 6603 zfeature_register() should verify ZFEATURE_FLAG_PER_DATASET >>> implies SPA_FEATURE_EXTENSIBLE_DATASET >>> 6672 arc_reclaim_thread() should use gethrtime() instead of >>> ddi_get_lbolt() >>> 6673 want a macro to convert seconds to nanoseconds and vice-versa >>> 6370 ZFS send fails to transmit some holes >>> 6681 zfs list burning lots of time in dodefault() via dsl_prop_* >>> 6738 zfs send stream padding needs documentation >>> 6841 Undirty freed spill blocks >>> 6843 Make xattr dir truncate and remove in one tx >>> 6842 Fix empty xattr dir causing lockup >>> 6914 kernel virtual memory fragmentation leads to hang >>> >>> I may have missed one or two, OR I may have ones listed here that are >>> not in r151014's backports, but one of these is likely what cures zfs recv >>> from hanging with particularly interesting send streams like the one I have >>> illustrated above. >>> >>> Dan >>> >>> >> >> >> -- >> Paul Dagnelie >> > > > > -- > Paul Dagnelie > -- Paul Dagnelie ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com