On Fri, May 29, 2015 at 8:09 AM, Richard Yao <[email protected]>
wrote:

> I should add that the purpose of the pipe is to avoid situations where
> iteration takes arbitrarily long due to needing to allocate enough memory
> in userland and having the kernel/userspace race with increases in memory
> requirements as it iterates (or having things hang from too many/large
> pools to list).
>

How much memory are we talking about?  (few MB?  few GB?)

I don't see what would be racy about the situation you described.


>
> On 29 May 2015 at 11:06, Richard Yao <[email protected]> wrote:
>
>> Dear Matt,
>>
>> I am trying to solve is the absence of a sane way to get the state in a
>> manner similar to `zfs list` from libzfs_core.
>>
>
If you're OK with the semantics of "zfs list" (everything it tells you was
true at some point during the list operation), you could do the same thing
with libzfs_core.


> The current API is non-atomic and I was concerned about memory utilization
>> on large/many pools, so I wrote a lzc_list that operated using a pipe while
>> holding locks. When porting that to Illumos, I realized that these locks
>> could be held arbitrarily long by userland, so I came up with a second
>> approach that did holds. Unfortunately, this lead the code to use linear
>> memory with the number of datasets and hurt atomicity, so I came up with
>> the idea of pinning the the last txg and using that as output.
>>
>
I don't really know what you mean by "pinning the last txg".  (Beyond not
allowing its blocks to be overwritten -- see the questions in my previous
email.)


>
>> There seems to be no way to list datasets in a way that is simultaneously
>> atomic, non-blocking, memory efficient and consistent (where we get the
>> latest state).
>>
>
That is indeed nontrivial.  For example, there's no way to do that for
listing a directory hierarchy.  Can you describe what each of those
requirements is exactly (e.g. what is the difference between "atomic" and
"consistent"?)  I can guess at the others but it would be best to lay out
your requirements explicitly.

My latest thoughts are to implement lzc_list with an "atomic" toggle that
>> does the first way that I implemented things
>>
>
Meaning it grabs locks to prevent create/destroy/rename operations from
taking place while the list is in progress?

as a privileged operation that by default requires root in the global zone
>> on Illumos and pushes the non-atomic code we have now into the kernel when
>> it is not.
>>
>> What do you think?
>>
>> Yours truly,
>> Richard Yao
>>
>>
>> On 27 May 2015 at 17:08, Matthew Ahrens <[email protected]> wrote:
>>
>>>
>>>
>>> On Wed, May 27, 2015 at 7:23 PM, Richard Yao <[email protected]>
>>> wrote:
>>>
>>>> Dear Everyone,
>>>>
>>>> As some people know, I have been working on libzfs_core extensions and
>>>> I currently have a prototype lzc_list command that is a large subset of the
>>>> functionality of `zfs list`.
>>>>
>>>> After discussing it with others, I suspect that implementing an in-core
>>>> pool metadata snapshot facility for lzc_list would be the most natural way
>>>> of implementing lzc_list. The in-core pool snapshot would atomically pin
>>>> the metadata state of a pool on disk for `zfs list` without holding locks
>>>> while it is traversed (my first implementation) or requiring that we pin
>>>> memory via holds on dsl_dataset_t objects until the operation is finished
>>>> (my second implementation). While the in-core pool metadata snapshot is in
>>>> effect, the blocks containing the pool metadata would not be marked free in
>>>> the in-core free space_map, but it would be marked free in the on-disk
>>>> space map when it would normally be marked as such. That has the downside
>>>> that disk space would not be freed right away, but we make no guarantees of
>>>> immediately freeing disk space anyway, so I suspect that is okay.
>>>>
>>>> Would this be something entirely new or is there already a way to
>>>> snapshot the pool's metadata state in-core either of which I am unaware or
>>>> in a branch somewhere?
>>>>
>>>>
>>> Let's say for the sake of argument that we don't overwrite anything on
>>> disk that you care about.  What else do you need to do?  I'm imagining that
>>> you will have a separate idea of the metadata state (what datasets exist,
>>> their properties and interrelations, etc), which is out of date from what's
>>> really on disk.  How do you maintain that?  It seems nontrivial.
>>>
>>> Maybe you could start by describing the problem that you are solving?
>>> It sounds like you want an atomic view of the pool metadata (that's used by
>>> "zfs list")?
>>>
>>> --matt
>>>
>>
>>
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to