On 24.08.2019 16:39, Ivan Zhakov wrote:
> On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov <i...@visualsvn.com> wrote:
>> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov <i...@visualsvn.com> wrote:
>>> On Wed, 3 Jul 2019 at 18:37, Joe Orton <jor...@redhat.com> wrote:
>>>> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
>>>>> They also make the existing old API unusable for many cases thus
>>>>> making a switch to the new API mandatory, even though it doesn't have
>>>>> to be so (see below).
>>>>>
>>>>> APR 1.x did not specify that the result of apr_dir_read() has to be 
>>>>> allocated
>>>>> in dir->pool:
>>>>>
>>>>>   
>>>>> https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
>>>>>
>>>>> This function does not accept a pool argument, and I think that it's 
>>>>> natural
>>>>> to assume limited lifetime in such case. This is what the current major 
>>>>> APR
>>>>> users (httpd, svn) do, and other users should probably be ready for that 
>>>>> too,
>>>>> since on Win32 the subsequent calls to apr_dir_read() already invalidate 
>>>>> the
>>>>> previous apr_finfo_t.
>>>> Are there many examples in the APR API where returned strings are not
>>>> either pool-allocated or constant (process lifetime)?  It seems unusual
>>>> and very surprising to me.
>>>>
>>>> A hypothetical API consumer can have made either assumption:
>>>>
>>>> a) of constant memory (true with limits on Win32, breaks for Unix), or
>>>> b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
>>>>
>>>> neither is documented and both are broken by current implementations. It
>>>> is impossible to satisfy both so anybody can argue that fixing either
>>>> implementation to match the other is a regression.
>>>>
>>>> Doubling down on (a) over (b) seems strongly inferior to me, the API
>>>> cannot support this without introducing runtime overhead for all callers
>>>> (a new iterpool in the dir object), so I'd rather fix this properly with
>>>> a new API.
>>>>
>>>> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
>>>> the same as introducing _pread() if desired - making the strdup only
>>>> happen for _pread would only be a minor tweak, not a whole new subpool.
>>>>
>>> There is example of apr_hash_t.ITERATOR that is statically allocated in
>>> the hash object and returned to the caller.
>>>
>>> Also the native readdir() API behaves the same way [1]:
>>> [[[
>>> The returned pointer, and pointers within the structure, might be 
>>> invalidated
>>> or the structure or the storage areas might be overwritten by a subsequent 
>>> call
>>> to readdir() on the same directory stream.
>>> ]]]
>>>
>>> From this perspective the current behavior of apr_dir_read on Unix looks
>>> inconsistent and surprising for the API user because it is impossible to use
>>> in a loop without unbounded memory usage.
>>>
>>> I strongly agree that the revved version of this function that accepts a 
>>> POOL
>>> argument will be good from both usage and implementation terms. I would be 
>>> +1
>>> on adding such API. But I also think that unless we fix the original issue
>>> by using the preallocated buffer/iterpool, the whole thing would still be
>>> problematic for any existing API user.
>>>
>>> More detailed: if we take this opportunity to choose and document the API to
>>> consistently return a temporary result, then as the caller I can either 
>>> process
>>> the data during iteration or manually put it into a long-living pool if 
>>> that is
>>> necessary. I think that all current callers work with the API this way. And 
>>> I
>>> can also switch to the new revved function to manually specify the pool and
>>> better control the allocations. If I would like to support both APR 1.x and 
>>> 2.x
>>> the compatibility is also straightforward to achieve.
>>>
>>> If we instead choose an option where the results may be allocated
>>> in the dir object pool, then as the caller I cannot avoid unbounded memory
>>> usage when supporting both APR 1.x and 2.x. And this approach also 
>>> introduces
>>> the unavoidable the unbounded memory usage for all Win32 users that
>>> do not update their code.
>>>
>>> [1] 
>>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475
>>>
>> Is there any feedback or thoughts on these proposed changes?
>>
>> Thanks!
>>
> For what it's worth, I'm -1 on the changes done in r1862071 and
> r1862435 for the reasons listed above.
>
> PS: No idea if I can actually veto changes, because while I'm a
> committer on the APR project, I am not in its PMC. But "-1" properly
> expresses my technical opinion on this topic.


I agree with your analysis. I think it would be best if you just
implemented your proposal, including the revised version of the API. The
unbounded-memory case you describe is pretty much a showstopper. The
allocation choices do have to be documented better, IMO.

That would be trunk and 1.7.x, although we can't add a revised API in
the 1.7.x line since it's released already. For compatibility it would
probably be best to leave apr_dir_read on trunk without the extra pool
parameter.

-- Brane

Reply via email to