On 24.08.2019 16:39, Ivan Zhakov wrote: > On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov <i...@visualsvn.com> wrote: >> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov <i...@visualsvn.com> wrote: >>> On Wed, 3 Jul 2019 at 18:37, Joe Orton <jor...@redhat.com> wrote: >>>> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote: >>>>> They also make the existing old API unusable for many cases thus >>>>> making a switch to the new API mandatory, even though it doesn't have >>>>> to be so (see below). >>>>> >>>>> APR 1.x did not specify that the result of apr_dir_read() has to be >>>>> allocated >>>>> in dir->pool: >>>>> >>>>> >>>>> https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8 >>>>> >>>>> This function does not accept a pool argument, and I think that it's >>>>> natural >>>>> to assume limited lifetime in such case. This is what the current major >>>>> APR >>>>> users (httpd, svn) do, and other users should probably be ready for that >>>>> too, >>>>> since on Win32 the subsequent calls to apr_dir_read() already invalidate >>>>> the >>>>> previous apr_finfo_t. >>>> Are there many examples in the APR API where returned strings are not >>>> either pool-allocated or constant (process lifetime)? It seems unusual >>>> and very surprising to me. >>>> >>>> A hypothetical API consumer can have made either assumption: >>>> >>>> a) of constant memory (true with limits on Win32, breaks for Unix), or >>>> b) of pool-allocated string lifetime (works for Unix, breaks for Win32) >>>> >>>> neither is documented and both are broken by current implementations. It >>>> is impossible to satisfy both so anybody can argue that fixing either >>>> implementation to match the other is a regression. >>>> >>>> Doubling down on (a) over (b) seems strongly inferior to me, the API >>>> cannot support this without introducing runtime overhead for all callers >>>> (a new iterpool in the dir object), so I'd rather fix this properly with >>>> a new API. >>>> >>>> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at >>>> the same as introducing _pread() if desired - making the strdup only >>>> happen for _pread would only be a minor tweak, not a whole new subpool. >>>> >>> There is example of apr_hash_t.ITERATOR that is statically allocated in >>> the hash object and returned to the caller. >>> >>> Also the native readdir() API behaves the same way [1]: >>> [[[ >>> The returned pointer, and pointers within the structure, might be >>> invalidated >>> or the structure or the storage areas might be overwritten by a subsequent >>> call >>> to readdir() on the same directory stream. >>> ]]] >>> >>> From this perspective the current behavior of apr_dir_read on Unix looks >>> inconsistent and surprising for the API user because it is impossible to use >>> in a loop without unbounded memory usage. >>> >>> I strongly agree that the revved version of this function that accepts a >>> POOL >>> argument will be good from both usage and implementation terms. I would be >>> +1 >>> on adding such API. But I also think that unless we fix the original issue >>> by using the preallocated buffer/iterpool, the whole thing would still be >>> problematic for any existing API user. >>> >>> More detailed: if we take this opportunity to choose and document the API to >>> consistently return a temporary result, then as the caller I can either >>> process >>> the data during iteration or manually put it into a long-living pool if >>> that is >>> necessary. I think that all current callers work with the API this way. And >>> I >>> can also switch to the new revved function to manually specify the pool and >>> better control the allocations. If I would like to support both APR 1.x and >>> 2.x >>> the compatibility is also straightforward to achieve. >>> >>> If we instead choose an option where the results may be allocated >>> in the dir object pool, then as the caller I cannot avoid unbounded memory >>> usage when supporting both APR 1.x and 2.x. And this approach also >>> introduces >>> the unavoidable the unbounded memory usage for all Win32 users that >>> do not update their code. >>> >>> [1] >>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475 >>> >> Is there any feedback or thoughts on these proposed changes? >> >> Thanks! >> > For what it's worth, I'm -1 on the changes done in r1862071 and > r1862435 for the reasons listed above. > > PS: No idea if I can actually veto changes, because while I'm a > committer on the APR project, I am not in its PMC. But "-1" properly > expresses my technical opinion on this topic.
I agree with your analysis. I think it would be best if you just implemented your proposal, including the revised version of the API. The unbounded-memory case you describe is pretty much a showstopper. The allocation choices do have to be documented better, IMO. That would be trunk and 1.7.x, although we can't add a revised API in the 1.7.x line since it's released already. For compatibility it would probably be best to leave apr_dir_read on trunk without the extra pool parameter. -- Brane