Re: [discuss] ZFS recv hangs

Hetrick, Joseph P Mon, 07 Nov 2016 07:21:47 -0800

 FWIW,
 
 I can replicate this behavior using OmniOS Bleeding and stable as the zfs 
stream source to an older system.
 
 Here is some truss output showing same..  If it can’t receive the FS, it bails 
as would expect (if the dataset exits, for example) so it’s happening once the 
stream is supposed to begin receiving.
 
 Not as useful as the mdb output.. but… Anyone have any ideas here?
 
 Joe
 
 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, 
-1, 0) = 0xFEC70000
 memcntl(0xFEC80000, 24216, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
 open("/dev/zfs", O_RDWR)                        = 3
 fstat64(3, 0x08047B30)                          = 0
 stat64("/dev/pts/0", 0x08047BC0)                = 0
 open("/etc/mnttab", O_RDONLY)                   = 4
 fstat64(4, 0x08047AD0)                          = 0
 open("/etc/dfs/sharetab", O_RDONLY)             = 5
 fstat64(5, 0x08047AD0)                          = 0
 stat64("/lib/libzfs_core.so.1", 0x08047278)     = 0
 resolvepath("/lib/libzfs_core.so.1", "/lib/libzfs_core.so.1", 1023) = 21
 open("/lib/libzfs_core.so.1", O_RDONLY)         = 6
 mmapobj(6, MMOBJ_INTERPRET, 0xFEC70960, 0x080472E4, 0x00000000) = 0
 close(6)                                        = 0
 memcntl(0xFEC50000, 3800, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
 open("/dev/zfs", O_RDWR)                        = 6
 fstat64(6, 0x08047B20)                          = 0
 stat64("/dev/pts/0", 0x08047BB0)                = 0
 stat64("/lib/libavl.so.1", 0x08047248)          = 0
 resolvepath("/lib/libavl.so.1", "/lib/libavl.so.1", 1023) = 16
 open("/lib/libavl.so.1", O_RDONLY)              = 7
 mmapobj(7, MMOBJ_INTERPRET, 0xFECC0DD0, 0x080472B4, 0x00000000) = 0
 close(7)                                        = 0
 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, 
-1, 0) = 0xFEC20000
 memcntl(0xFEC30000, 2416, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
 open("/etc/mnttab", O_RDONLY)                   = 7
 fstat64(7, 0x08047AE0)                          = 0
 sysconfig(_CONFIG_PAGESIZE)                     = 4096
 ioctl(0, TCGETA, 0x08047C80)                    Err#25 ENOTTY
 open("/dev/zfs", O_RDWR|O_EXCL)                 = 8
 fstat64(8, 0x08047AB0)                          = 0
 stat64("/dev/pts/3", 0x08047B40)                Err#2 ENOENT
 open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSLIB.mo", O_RDONLY) 
Err#2 ENOENT
 read(0, "\0\0\0\0\0\0\0\0ACCBBAF5".., 312)      = 312
 read(0, 0x080475B8, 0)                          = 0
 time()                                          = 1478531928
 stat64("/lib/libnvpair.so.1", 0x08042F78)       = 0
 resolvepath("/lib/libnvpair.so.1", "/lib/libnvpair.so.1", 1023) = 19
 open("/lib/libnvpair.so.1", O_RDONLY)           = 9
 mmapobj(9, MMOBJ_INTERPRET, 0xFEC204C0, 0x08042FE4, 0x00000000) = 0
 close(9)                                        = 0
 memcntl(0xFEB60000, 18964, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08042400)      = 0
 brk(0x0811C000)                                 = 0
 ioctl(3, ZFS_IOC_POOL_STATS, 0x08040D70)        = 0
 brk(0x0812C000)                                 = 0
 ioctl(3, ZFS_IOC_OBJSET_STATS, 0x08042400)      Err#2 ENOENT
 ioctl(1, TCGETA, 0x08042AC0)                    = 0
 fstat64(1, 0x08042A20)                          = 0
 receiving full stream of dpool01/joetest@now into dpool01/omnitest@now
 write(1, " r e c e i v i n g   f u".., 71)      = 71
 
 On 11/3/16, 8:18 AM, "Hetrick, Joseph P" <joseph-hetr...@uiowa.edu> wrote:
 
 Per Alex suggestion to see where ZFS is at during the hang period:
 
 THREAD           STATE    SOBJ                COUNT
 ffffff007a8b3c40 SLEEP    CV                      3
                  swtch+0x145
                  cv_timedwait_hires+0xe0
                  cv_timedwait+0x5a
                  txg_thread_wait+0x7c
                  txg_sync_thread+0x118
                  thread_start+8
 
 ffffff007a292c40 SLEEP    CV                      3
                  swtch+0x145
                  cv_wait+0x61
                  spa_thread+0x225
                  thread_start+8
 
 ffffff007a8aac40 SLEEP    CV                      3
                  swtch+0x145
                  cv_wait+0x61
                  txg_thread_wait+0x5f
                  txg_quiesce_thread+0x94
                  thread_start+8
 
 ffffff007a1bbc40 SLEEP    CV                      1
                  swtch+0x145
                  cv_timedwait_hires+0xe0
                  cv_timedwait+0x5a
                  arc_reclaim_thread+0x13d
                  thread_start+8
 
 ffffff007a1c1c40 SLEEP    CV                      1
                  swtch+0x145
                  cv_timedwait_hires+0xe0
                  cv_timedwait+0x5a
                  l2arc_feed_thread+0xa1
                  thread_start+8
 
 ffffff11bde0f4a0 ONPROC   <NONE>                  1
                  mutex_exit
                  dbuf_hold_impl+0x81
                  dnode_next_offset_level+0xee
                  dnode_next_offset+0xa2
                  dmu_object_next+0x54
                  restore_freeobjects+0x7e
                  dmu_recv_stream+0x7f1
                  zfs_ioc_recv+0x416
                  zfsdev_ioctl+0x347
                  cdev_ioctl+0x45
                  spec_ioctl+0x5a
                  fop_ioctl+0x7b
                  ioctl+0x18e
                  _sys_sysenter_post_swapgs+0x149
 
 echo "::stacks -m zfs" |mdb -k
 THREAD           STATE    SOBJ                COUNT
 ffffff007a8b3c40 SLEEP    CV                      3
                  swtch+0x145
                  cv_timedwait_hires+0xe0
                  cv_timedwait+0x5a
                  txg_thread_wait+0x7c
                  txg_sync_thread+0x118
                  thread_start+8
 
 ffffff007a292c40 SLEEP    CV                      3
                  swtch+0x145
                  cv_wait+0x61
                  spa_thread+0x225
                  thread_start+8
 
 ffffff007a8aac40 SLEEP    CV                      3
                  swtch+0x145
                  cv_wait+0x61
                  txg_thread_wait+0x5f
                  txg_quiesce_thread+0x94
                  thread_start+8
 
 ffffff007a1bbc40 SLEEP    CV                      1
                  swtch+0x145
                  cv_timedwait_hires+0xe0
                  cv_timedwait+0x5a
                  arc_reclaim_thread+0x13d
                  thread_start+8
 
 ffffff007a1c1c40 SLEEP    CV                      1
                  swtch+0x145
                  cv_timedwait_hires+0xe0
                  cv_timedwait+0x5a
                  l2arc_feed_thread+0xa1
                  thread_start+8
 
 ffffff11bde0f4a0 ONPROC   <NONE>                  1
                  dbuf_hash+0xdc
                  0xffffff11ca05c460
                  dbuf_hold_impl+0x59
                  dnode_next_offset_level+0xee
                  dnode_next_offset+0xa2
                  dmu_object_next+0x54
                  restore_freeobjects+0x7e
                  dmu_recv_stream+0x7f1
                  zfs_ioc_recv+0x416
                  zfsdev_ioctl+0x347
                  cdev_ioctl+0x45
                  spec_ioctl+0x5a
                  fop_ioctl+0x7b
                  ioctl+0x18e
                  _sys_sysenter_post_swapgs+0x149
 
 Where the action was:
 
 zfs recv -v dpool01/wtf <test-15-out
 receiving full stream of dpool01/test@now into dpool01/wtf@now
 
 test-15-out is zfs send dpool01/test@now >test-15-out and then sent to the node
 
 It’s only about 48k in size; no filesystem data (though, problem exists when I 
have a filesystem with data).
 
 I’ve created a few identical filesystems on a few nodes and done some hex 
compares with them; but nothing extensive beyond “I differences”.
 
 Thanks Alex,
 
 Joe
 
 On 11/2/16, 11:15 AM, "Hetrick, Joseph P" <joseph-hetr...@uiowa.edu> wrote:
 
 Hi folks,
 
 We’ve run into an odd issue that seems concerning.
 
 Our shop runs OpenIndiana and we’ve got several versions in play.  Recently 
while testing a new system which is much more recent (bleeding edge OI Hipster 
release) we discovered that zfs sends to older systems caused hangs.  By older, 
we’re talking same zfs/zpool versions of 5/28 and no visible properties 
differences.
 
 Can provide more info if told what is useful; but the gist is that:
 
 Zfs send of a vanilla dataset (no properties defined other than defaults) to 
any “older” system causes the recv to hang, eventually the host will crash.  
Truss’ing the recvr process doesn’t seem to give a lot of info as to the cause. 
 Filesystem snapshot is received; and then that’s it.
 
 No fancy send or recv args in play (zfs send dataset  via netcat or mbuffer or 
ssh to a recv –v <dest>.
 
 A close comparision of zfs and pool properties shows no difference.  On a whim 
we even created pools and datasets that were downversioned below the senders.
 
 We’ve seen this in hosts a bit later than: illumos-a7317ce   but not before 
(and certainly a bit later); and where we are now: illumos-2816291.
 
 Oddly illumos-a7317ce systems appear to be able to receive these datasets just 
fine…and we’ve had no problems with systems of that vintage sending to older 
systems.
 
 Any ideas and instruction is most welcome,
 
 Joe



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Re: [discuss] ZFS recv hangs

Reply via email to