Philip Guenther via austin-group-l at The Open Group wrote in <CAKKmsNif0Fd+0V4dF418rwkS2crjqLvJh7vNhEcP4rJQ=sw...@mail.gmail.com>: |On Tue, Sep 1, 2020 at 1:20 PM Steffen Nurpmeso <[email protected]> wrote: |> Philip Guenther wrote in |> <cakkmsnhee-umta17hut19dqttnj-tm3aqpngimk-bplehq_...@mail.gmail.com>: |>|On Tue, Sep 1, 2020 at 6:22 AM Steffen Nurpmeso via austin-group-l at The |>|Open Group <[email protected]> wrote: |>|> Robert Elz via austin-group-l at The Open Group wrote in |>|> <[email protected]>: |>|>| Date: Tue, 1 Sep 2020 10:32:55 +0100 |>|>| From: "Geoff Clare via austin-group-l at The Open Group" \ |>|>| <[email protected]> |>|>| Message-ID: <20200901093255.GA7629@localhost> |>|> ... |>|>|What's more important is what happens if the application buffer isn't |>|>|big enough for the next entry. What do the existing getdents() |>|>|implementations do in that case? If they're all the same then |> .. |>|> Isn't that covered nicely by the posted text? There must be space |>|> for at least one entry, otherwise EINVAL occurs? And upon success |> .. |>|A quick review of FreeBSD, NetBSD, and OpenBSD finds they all return |> EINVAL |>|if the buffer isn't "big enough". |>|For OpenBSD, the minimum buffer size is 512 bytes; if I'm reading it |>|correctly NetBSD is similar, possibly varying based on filesystem |>|formatting. |>|FreeBSD requires space for the next entry. |> |> They document that "the size must be greater than or equal to the |> block size associated with the file", but i cannot find this "next |> entry" of yours? |> | |What document are you quoting from?
$ git show origin/master:lib/libc/sys/getdirentries.2|mandoc|less It uses "next" only in conjunction with seeking or entry hopping. But it is likely you used "next" to mean "at least one", then this is just a misunderstanding. |I actually tested the behavior of FreeBSD 11.3 getdirentries(2), after |looking at the code. |The manpage on that system says: ..EINVAL, yes.. |... |>|At least NetBSD and OpenBSD will return an entry with d_ino == 0 if the |>|first entry in a block is removed. I suspect others may do this as well; |>|glibc at least includes code to skip such entries in its generic |> readdir() |>|implementation. |>| |>|The question really is "is this supposed to be a API that can be |> trivially |>|supported by all the existing versions, even if that makes it more clunky |>|to use, or should it be easy to use even if every single existing |>|implementation needs to bend?" |> |> But .. it is already supported by all, and it has always been |> used?! | |As I've been describing, that is not true for at least NetBSD and OpenBSD: | * they both require the buffer to be some size larger than a single entry | * they both can return entries with d_ino == 0 and d_name that doesn't |correspond | to a file in the directory | |"The same, but different" means "NOT THE SAME". Code that follows the |proposed description on those points would not behaved as expected if used |with NetBSD's or OpenBSD's getdents(2). All i can say is that i would skip those entries. There is no undeletion facility, so passing as much information as possible about directory content is fine, but it must be somehow useful in the end; and then specific programs which include very specific headers and use very specific ioctls can do _that_ job. My opinion. ... |>|If the former, then define a minimum buffer size |>|(pathconf(_PC_DIRBUFMIN)...?), permit d_ino==0 as entries where d_name |> and |> |> However there is _PC_NAME_MAX already, and so the number must be |> nearby, no? Isn't that overengineering? | |The buffers required by NetBSD and OpenBSD are larger than and unrelated to |the value returned by pathconf(_PC_NAME_MAX) and instead related to a block |size of the filesystem. In fact i personally would not call this function with less than a page full of memory, conditionally more. That reminds me that it is a pity that there is no "get_usable_size" for a size and "get_usable_size_ptr" (or so, overloading not in C) for malloc. |>|d_type are unspecified, and let d_name be either a fixed fix array or a |>|flexible array member. |> |> You know, my personal position would be to just skip those entries |> when copying data over to user buffers. The costs of walking over |> the user buffer once (if it is done like that, and that memory |> should be hot even, then) seem to be low compared to collecting |> the directory entry information. | |Sure, I'm fine with this new API specifying that those deleted entries be |suppressed, but it's inconsistent to do that and then insist that d_name's |nature be unspecified to make the API compatible with existing |implementation, despite making it more annoying to program with. As far as i understood the d_name nature was about reuse of existing structures which use fixed-size names, and has nothing to do with "currently" non-existing files, to which i count whiteouts too, by the way. |>|If the latter, then require it to work with very small buffers, require |> all |>|entries to have valid d_name and d_type, and specify d_name as a FAM. |> |> That d_type from the start would be great. | |If you mean "require d_type to have a real value and never DT_UNKNOWN", |then that's a step further which I don't think any existing getd*ent* API |has taken and which would make this API _slower_ than readdir() on some |implementation+filesystem combos. I even have had to add a fallback path which "check[s] whether offsetof()-_namlen is 2 less than d_reclen, but otherwise use the fallback path", because d_type as such could be missing on earlier Linux! # define _DT_OF_DE(X) (r(ui1*,X)[(X)->d_reclen - 1]) Ha! And Yay! again! --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
