On 7 October 2015 at 14:53, Julian Foad <julianf...@gmail.com> wrote: > Ivan Zhakov wrote: >> Bert Huijben wrote: >>>> URL: http://svn.apache.org/viewvc?rev=1707196&view=rev >>>> Log: >>>> Slightly optimize svn_stringbuf_from_stream() to avoid allocating twice >>>> more memory and unnecessary memcpy() when LEN_HINT is equal to final >>>> stringbuf >>>> length. >>>> >>>> * subversion/libsvn_subr/stream.c >>>> (svn_stringbuf_from_stream): Always preallocate LEN_HINT + >>>> MIN_READ_SIZE >>>> bytes to be able perform final read without stringbuf reallocation. >>> >>> Can you explain why hint + MIN_READ_SIZE instead of something like >>> MAX(len_hint+1, MIN_READ_SIZE) >>> >>> I don't know what MIN_READ_SIZE is from just this patch, but it could >>> easily be >>> something like 16 Kbyte, while len_hint could be something like 16 for a >>> file like >>> 'format' that was just statted to obtain the actual size >>> >>> 16 Kbyte + 16 bytes for a 16 bytes file looks a bit large... And allocating >>> 16 + 32 >>> bytes is far more efficient than allocating that huge chunk at once. >>> >> MIN_READ_SIZE is 64 bytes and it's locally defined in this function. >> We cannot use MAX(len_hint+1, MIN_READ_SIZE) because main loop >> reallocates buffer if don't have MIN_READ_SIZE remaining. The current >> code assumes that MIN_READ_SIZE is small value and I decided to leave >> this code intact. It could be improved of course, but I wanted to make >> minimum change first. > > I don't think that was the right fix. One deficiency with the code is > that it doesn't break out of the loop when end-of-stream is detected, > which can be reported by a 'short' read. It's a different problem and I my plan was to fix in another commit. I don't think that problem in existing code should be reason to do not fix one of them.
> Your change doesn't solve > that problem in general, although it does solve it for the case where > the supplied length hint is exactly right. > > We also have the function > > svn_string_from_stream() > > which performs a virtually identical task. These two functions have > completely different implementations. svn_string_from_stream() does > break out of the loop as soon as it detects end-of-stream reported as > a short read. They also use completely different read sizes > (MIN_READ_SIZE = 64 bytes vs. SVN__STREAM_CHUNK_SIZE = 16384). > Did you notice that current svn_stringbuf_from_stream() implementation increase read size twice on every loop iteration? > That is silly: there is no good reason for that, and it is a waste of > effort maintaining them separately, and discussing separate > optimizations for each. > > We should make those two functions share the same "best" implementation. > I agree that it would be nice to have the same "best" implementation for both functions. But svn_string_from_stream() function less efficient it terms of memory operations, while reading stream using bigger chunks at the beginning. Another difference that svn_string_from_stream() closes source stream, while svn_stringbuf_from_stream() don't. Disclaimer: I didn't wrote svn_string_from_stream() nor svn_stringbuf_from_stream() and I'm not aware why these functions have different implementations. I've just spotted one particular problem and fixed it. -- Ivan Zhakov