Re: [zfs-discuss] help with a BIG problem,

Hernan Freschi Sat, 24 May 2008 07:30:42 -0700

So, I think I've narrowed it down to two things:

* ZFS tries to destroy the dataset every time it's called because the last time 
it didn't finish destroying
* In this process, ZFS makes the kernel run out of memory and die


So I thought of two options, but I'm not sure if I'm right:

Option 1: "Destroy" is an atomic operation

If destroy is atomic, then I guess what it's trying to do is look up all the 
blocks that need to be deleted/unlinked/released/freed (not sure which is the 
word). After it has that list, it will write it to the ZIL (remember this is 
just what I suppose, correct me if I'm wrong!) and start to physically delete 
the blocks, until the operation is done and it's finally committed.

If this is the case, then the process will be restarted from scratch every time 
the system is rebooted. But I read that apparently in previous versions, 
rebooting while destroying a clone that it's taking too long makes the clone 
reappear intact next time. This, and the fact that zpool iostat show only reads 
and no or very few writes is what lead me to think this is how it works.

So if this is the case, I'd like to abort this destroy. After importing the 
pool, I will have everything as it was and maybe I can delete snapshots before 
the clone's parent snapshot and maybe this will speed up the destroy process, 
or just leave the clone.

Option 2: Destroy is "not" atomic

By this I don't mean that it's not "atomic", as in "if the operation is 
canceled, it will finish in an incomplete state", but as in "if the system is 
rebooted, the operation will RESUME at the point it was where it died. 

If this is the case, maybe I can write a script to reboot the computer in a 
fixed amount of time, and run it on boot:

zpool import xx &
sleep 20 seconds
rm /etc/zfs/zpool.cache
sleep 1800 seconds
reboot

This will work under the assumption that the list of blocks to be deleted is 
flushed to the ZIL or something before boot, to allow the operation to restart 
at the same point. This is a very nasty hack but it may do the trick only in a 
very slow fashion: zpool iostat shows 1MB/s read when it's doing the destroy. 
The dataset in question has 450GB which means that the operation will take 5 
days to finish if it needs to read the whole dataset to destroy it, or 7 days 
if it also needs to go through the other snapshots (600GB total).

So, my only viable option seems to be to "abort" this. How can I do this? 
disable the ZIL, maybe? Delete the ZIL? scrub after this?

Thanks,
Hernán
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] help with a BIG problem,

Reply via email to