On Fri, May 29, 2015 at 8:09 AM, Richard Yao <[email protected]> wrote:
> I should add that the purpose of the pipe is to avoid situations where > iteration takes arbitrarily long due to needing to allocate enough memory > in userland and having the kernel/userspace race with increases in memory > requirements as it iterates (or having things hang from too many/large > pools to list). > How much memory are we talking about? (few MB? few GB?) I don't see what would be racy about the situation you described. > > On 29 May 2015 at 11:06, Richard Yao <[email protected]> wrote: > >> Dear Matt, >> >> I am trying to solve is the absence of a sane way to get the state in a >> manner similar to `zfs list` from libzfs_core. >> > If you're OK with the semantics of "zfs list" (everything it tells you was true at some point during the list operation), you could do the same thing with libzfs_core. > The current API is non-atomic and I was concerned about memory utilization >> on large/many pools, so I wrote a lzc_list that operated using a pipe while >> holding locks. When porting that to Illumos, I realized that these locks >> could be held arbitrarily long by userland, so I came up with a second >> approach that did holds. Unfortunately, this lead the code to use linear >> memory with the number of datasets and hurt atomicity, so I came up with >> the idea of pinning the the last txg and using that as output. >> > I don't really know what you mean by "pinning the last txg". (Beyond not allowing its blocks to be overwritten -- see the questions in my previous email.) > >> There seems to be no way to list datasets in a way that is simultaneously >> atomic, non-blocking, memory efficient and consistent (where we get the >> latest state). >> > That is indeed nontrivial. For example, there's no way to do that for listing a directory hierarchy. Can you describe what each of those requirements is exactly (e.g. what is the difference between "atomic" and "consistent"?) I can guess at the others but it would be best to lay out your requirements explicitly. My latest thoughts are to implement lzc_list with an "atomic" toggle that >> does the first way that I implemented things >> > Meaning it grabs locks to prevent create/destroy/rename operations from taking place while the list is in progress? as a privileged operation that by default requires root in the global zone >> on Illumos and pushes the non-atomic code we have now into the kernel when >> it is not. >> >> What do you think? >> >> Yours truly, >> Richard Yao >> >> >> On 27 May 2015 at 17:08, Matthew Ahrens <[email protected]> wrote: >> >>> >>> >>> On Wed, May 27, 2015 at 7:23 PM, Richard Yao <[email protected]> >>> wrote: >>> >>>> Dear Everyone, >>>> >>>> As some people know, I have been working on libzfs_core extensions and >>>> I currently have a prototype lzc_list command that is a large subset of the >>>> functionality of `zfs list`. >>>> >>>> After discussing it with others, I suspect that implementing an in-core >>>> pool metadata snapshot facility for lzc_list would be the most natural way >>>> of implementing lzc_list. The in-core pool snapshot would atomically pin >>>> the metadata state of a pool on disk for `zfs list` without holding locks >>>> while it is traversed (my first implementation) or requiring that we pin >>>> memory via holds on dsl_dataset_t objects until the operation is finished >>>> (my second implementation). While the in-core pool metadata snapshot is in >>>> effect, the blocks containing the pool metadata would not be marked free in >>>> the in-core free space_map, but it would be marked free in the on-disk >>>> space map when it would normally be marked as such. That has the downside >>>> that disk space would not be freed right away, but we make no guarantees of >>>> immediately freeing disk space anyway, so I suspect that is okay. >>>> >>>> Would this be something entirely new or is there already a way to >>>> snapshot the pool's metadata state in-core either of which I am unaware or >>>> in a branch somewhere? >>>> >>>> >>> Let's say for the sake of argument that we don't overwrite anything on >>> disk that you care about. What else do you need to do? I'm imagining that >>> you will have a separate idea of the metadata state (what datasets exist, >>> their properties and interrelations, etc), which is out of date from what's >>> really on disk. How do you maintain that? It seems nontrivial. >>> >>> Maybe you could start by describing the problem that you are solving? >>> It sounds like you want an atomic view of the pool metadata (that's used by >>> "zfs list")? >>> >>> --matt >>> >> >> >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
