Hi Boris,
I may be able to shed some light on this for you.
When ZFS was originally ported to Linux I made all zvols (including
snapshots) available as block devices under /dev. At the time I wasn't
aware that OpenSolaris/FreeBSD only presented the writable datasets and
not all the snapshots.
Some users found having easy read-only access the zvol snapshots
convenient. Others didn't care for the all extra devices in /dev.
Since there were legitimate use cases for both behaviors we added the
'snapdev' property to control the visibility of zvol snapshots.
However, since the other OpenZFS implementation never create block
devices for snapshots I could certainly see how there might be scaling
issues here which we need to investigate.
It sounds like you've already correctly identified a few of the
bottlenecks which need to be addressed. Since this issue really is
specific to ZoL it would be best if you could open an issue on our
tracker. Then we could work through how best to resolve it.
https://github.com/zfsonlinux/zfs/issues/new
I did take a look at the code and the dmu_objset_find() traversal at the
end of zfs_ioc_snapshot() looks redundant to me as well. This seems
like something we should be able to get rid of since we only need to
create the specific minors for the new snapshots.
Thanks,
Brian
On 03/18/14 15:30, Boris wrote:
Hello,
I came across what seems to be a scalability issue with snapshots in ZFS
on Linux. When many snapshots are taken concurrently (I tried 400), it
takes a long time (minutes) for each 'zfs snapshot' command to complete
(although they all do execute concurrently). I believe I tracked down
the problem to the invocation of zvol_create_minors(poolname) in
zfs_ioc_snapshot() before the latter exits.
A cursory look at the function seems to indicate that the call is there
to facilitate the 'snapdev' feature that appears to create block devices
for snapshots of zvols that have the snapdev=visible property set. It is
surely seems nice to have read only devices created for snapshots
automagically (if I understand the intent correctly). But it seems that
scanning the pool namespace (including all the snapshots) is a bit of a
heavy weight operation.
A part of the issues appears to be that the callbacks from
dmu_objset_find() take global zvol_state_lock. I have moved the
__zvol_snapdev_hidden() from under the lock to limit the impact, but
that did not address the problem sufficiently. It appears that just
scanning all the snapshots of all the datasets in the pool while
creating other snapshots and clones gets sufficiently slow for 400 zvols
and 10,000 total snapshots.
I believe dsl_dataset_snapshot() already calls zvol_create_minors() for
the snapshots under processing. Does anyone know why it is necessary to
perform the pool-wide scan in the end of zfs_ioc_snapshot() ?
Best regards, Boris.
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer