Dear Matt, I am trying to solve is the absence of a sane way to get the state in a manner similar to `zfs list` from libzfs_core. The current API is non-atomic and I was concerned about memory utilization on large/many pools, so I wrote a lzc_list that operated using a pipe while holding locks. When porting that to Illumos, I realized that these locks could be held arbitrarily long by userland, so I came up with a second approach that did holds. Unfortunately, this lead the code to use linear memory with the number of datasets and hurt atomicity, so I came up with the idea of pinning the the last txg and using that as output.
There seems to be no way to list datasets in a way that is simultaneously atomic, non-blocking, memory efficient and consistent (where we get the latest state). My latest thoughts are to implement lzc_list with an "atomic" toggle that does the first way that I implemented things as a privileged operation that by default requires root in the global zone on Illumos and pushes the non-atomic code we have now into the kernel when it is not. What do you think? Yours truly, Richard Yao On 27 May 2015 at 17:08, Matthew Ahrens <[email protected]> wrote: > > > On Wed, May 27, 2015 at 7:23 PM, Richard Yao <[email protected]> > wrote: > >> Dear Everyone, >> >> As some people know, I have been working on libzfs_core extensions and I >> currently have a prototype lzc_list command that is a large subset of the >> functionality of `zfs list`. >> >> After discussing it with others, I suspect that implementing an in-core >> pool metadata snapshot facility for lzc_list would be the most natural way >> of implementing lzc_list. The in-core pool snapshot would atomically pin >> the metadata state of a pool on disk for `zfs list` without holding locks >> while it is traversed (my first implementation) or requiring that we pin >> memory via holds on dsl_dataset_t objects until the operation is finished >> (my second implementation). While the in-core pool metadata snapshot is in >> effect, the blocks containing the pool metadata would not be marked free in >> the in-core free space_map, but it would be marked free in the on-disk >> space map when it would normally be marked as such. That has the downside >> that disk space would not be freed right away, but we make no guarantees of >> immediately freeing disk space anyway, so I suspect that is okay. >> >> Would this be something entirely new or is there already a way to >> snapshot the pool's metadata state in-core either of which I am unaware or >> in a branch somewhere? >> >> > Let's say for the sake of argument that we don't overwrite anything on > disk that you care about. What else do you need to do? I'm imagining that > you will have a separate idea of the metadata state (what datasets exist, > their properties and interrelations, etc), which is out of date from what's > really on disk. How do you maintain that? It seems nontrivial. > > Maybe you could start by describing the problem that you are solving? It > sounds like you want an atomic view of the pool metadata (that's used by > "zfs list")? > > --matt >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
