I should add that the purpose of the pipe is to avoid situations where iteration takes arbitrarily long due to needing to allocate enough memory in userland and having the kernel/userspace race with increases in memory requirements as it iterates (or having things hang from too many/large pools to list).
On 29 May 2015 at 11:06, Richard Yao <[email protected]> wrote: > Dear Matt, > > I am trying to solve is the absence of a sane way to get the state in a > manner similar to `zfs list` from libzfs_core. The current API is > non-atomic and I was concerned about memory utilization on large/many > pools, so I wrote a lzc_list that operated using a pipe while holding > locks. When porting that to Illumos, I realized that these locks could be > held arbitrarily long by userland, so I came up with a second approach that > did holds. Unfortunately, this lead the code to use linear memory with the > number of datasets and hurt atomicity, so I came up with the idea of > pinning the the last txg and using that as output. > > There seems to be no way to list datasets in a way that is simultaneously > atomic, non-blocking, memory efficient and consistent (where we get the > latest state). My latest thoughts are to implement lzc_list with an > "atomic" toggle that does the first way that I implemented things as a > privileged operation that by default requires root in the global zone on > Illumos and pushes the non-atomic code we have now into the kernel when it > is not. > > What do you think? > > Yours truly, > Richard Yao > > > On 27 May 2015 at 17:08, Matthew Ahrens <[email protected]> wrote: > >> >> >> On Wed, May 27, 2015 at 7:23 PM, Richard Yao <[email protected]> >> wrote: >> >>> Dear Everyone, >>> >>> As some people know, I have been working on libzfs_core extensions and I >>> currently have a prototype lzc_list command that is a large subset of the >>> functionality of `zfs list`. >>> >>> After discussing it with others, I suspect that implementing an in-core >>> pool metadata snapshot facility for lzc_list would be the most natural way >>> of implementing lzc_list. The in-core pool snapshot would atomically pin >>> the metadata state of a pool on disk for `zfs list` without holding locks >>> while it is traversed (my first implementation) or requiring that we pin >>> memory via holds on dsl_dataset_t objects until the operation is finished >>> (my second implementation). While the in-core pool metadata snapshot is in >>> effect, the blocks containing the pool metadata would not be marked free in >>> the in-core free space_map, but it would be marked free in the on-disk >>> space map when it would normally be marked as such. That has the downside >>> that disk space would not be freed right away, but we make no guarantees of >>> immediately freeing disk space anyway, so I suspect that is okay. >>> >>> Would this be something entirely new or is there already a way to >>> snapshot the pool's metadata state in-core either of which I am unaware or >>> in a branch somewhere? >>> >>> >> Let's say for the sake of argument that we don't overwrite anything on >> disk that you care about. What else do you need to do? I'm imagining that >> you will have a separate idea of the metadata state (what datasets exist, >> their properties and interrelations, etc), which is out of date from what's >> really on disk. How do you maintain that? It seems nontrivial. >> >> Maybe you could start by describing the problem that you are solving? It >> sounds like you want an atomic view of the pool metadata (that's used by >> "zfs list")? >> >> --matt >> > >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
