On Tue, Oct 14, 2014 at 9:45 PM, Steven Hartland via illumos-zfs < [email protected]> wrote:
> > ----- Original Message ----- From: "Steven Hartland" > >> I've been investigating an issue for a user who was seeing >> his pool import hang after upgrading on FreeBSD. After >> digging around it turned out the issue was due to lack >> of free space on the pool. >> >> As the pool imports it writes hence requiring space but the >> pool has so little space this was failing. The IO being >> a required IO it retries, but obviously fails again >> resulting the the pool being suspened hence the hang. >> >> With the pool suspended during import it still holds the >> pool lock so all attempts to query the status also hang, >> which is one problem as the user can't tell why the hang >> has occured. >> >> During the debugging I mounted the pool read only and >> sent a copy to another empty pool, which resulted in ~1/2 >> capacity being recovered. This seemed odd but I dismissed >> it at the time. >> >> The machine was then left, with the pool not being accessed, >> however I just recieved an alert from our monitoring for >> a pool failure. On looking I now see the new pool I created >> with 2 write errors and no free space. So just having the >> pool mounted, with no access happening, has managed to use >> the remain 2GB on the 4GB pool. >> >> Has anyone seen this before or has any ideas what might >> be going on? >> >> zdb -m -m -m -m <pool> shows allocation to transactions e.g. >> metaslab 100 offset c8000000 spacemap 1453 free 0 >> segments 0 maxsize 0 freepct 0% >> In-memory histogram: >> On-disk histogram: fragmentation 0 >> [ 0] ALLOC: txg 417, pass 2 >> [ 1] A range: 00c8000000-00c8001600 size: 001600 >> [ 2] ALLOC: txg 417, pass 3 >> [ 3] A range: 00c8001600-00c8003a00 size: 002400 >> [ 4] ALLOC: txg 418, pass 2 >> [ 5] A range: 00c8003a00-00c8005000 size: 001600 >> [ 6] ALLOC: txg 418, pass 3 >> [ 7] A range: 00c8005000-00c8006600 size: 001600 >> [ 8] ALLOC: txg 419, pass 2 >> [ 9] A range: 00c8006600-00c8007c00 size: 001600 >> [ 10] ALLOC: txg 419, pass 3 >> >> I tried destroying the pool and that hung, presumably due to >> IO being suspended after the out of space errors. >> > > After bisecting the kernel changes the commit which seems > to be causing this is: > https://svnweb.freebsd.org/base?view=revision&revision=268650 > https://github.com/freebsd/freebsd/commit/91643324a9009cb5fbc8c00544b778 > 1941f0d5d1 > which correlates to: > https://github.com/illumos/illumos-gate/commit/ > 7fd05ac4dec0c343d2f68f310d3718b715ecfbaf > > I've checked the two make the same changes so there doesn't > seem to have been a downstream merge issue, at least not on > this specific commit. > > My test now consists of: > 1. mdconfig -t malloc -s 4G -S 512 > 2. zpool create tpool md0 > 3. zfs recv -duF tpool < test.zfs > 4. zpool list -p -o free zfs 5 > > With this commit present, free reduces every 5 seconds until > the pool is out of space. Without it after at most 3 reductions > the pool settles and no further free space reduction is seen. > How did you generate the send stream? Please include the exact command you used. How big is the send stream? Can you provide the send stream? What is the output of "zfs recv"? Does it fail? What is the "freeing" property? What is the "leaked" property? If you write to the pool (before it runs completely out of space), does the free space jump back up to where it should be? I tried a simple test on my system, sending one pool to another and receiving with the flags you mentioned (on DelphixOS, which is not too different from illumos trunk) and the problem did not reproduce. After the receive, the pool is idle (no i/o observed with iostat, no change to "free" property). I took a look at the pool image you linked. The "leaked" space is consumed by deferred frees (zdb -bb will tell you this). My first guess would be that something is causing us to not process the deferred free list when we should. i.e. spa_sync() is not calling spa_sync_deferred_frees(), but it is subsequently syncing real changes to disk and adding more deferred frees. This code looks somewhat fragile -- we are essentially checking "will there be changes this txg", but the answer is not enforced. There were some changes to dsl_scan_active() as part of the commit you mentioned, which could be involved. --matt > I've also found that creating the pool without async_destroy > enabled also prevents the issue. > > An image that shows the final result of the leak can be found > here: > http://www.ijs.si/usr/mark/bsd/ > > On FreeBSD this image stalls on import unless imported readonly. > Once imported I used the following to create the test image > used above: > zfs send -R zfs/ROOT@auto-2014-09-19_22.30 >test.zfs > > Copying in the zfs illumos list to get more eyeballs given it > seems to be a quite serious issue. > > Regards > Steve > > > ------------------------------------------- > illumos-zfs > Archives: https://www.listbox.com/member/archive/182191/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182191/ > 21635000-ebd1d460 > Modify Your Subscription: https://www.listbox.com/ > member/?member_id=21635000&id_secret=21635000-73dc201a > Powered by Listbox: http://www.listbox.com >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
