On Wed, Oct 7, 2015 at 1:53 PM, Julian Foad <julianf...@gmail.com> wrote:
> Ivan Zhakov wrote: > > Bert Huijben wrote: > >>> URL: http://svn.apache.org/viewvc?rev=1707196&view=rev > >>> Log: > >>> Slightly optimize svn_stringbuf_from_stream() to avoid allocating twice > >>> more memory and unnecessary memcpy() when LEN_HINT is equal to final > >>> stringbuf > >>> length. > >>> > >>> * subversion/libsvn_subr/stream.c > >>> (svn_stringbuf_from_stream): Always preallocate LEN_HINT + > >>> MIN_READ_SIZE > >>> bytes to be able perform final read without stringbuf reallocation. > >> > >> Can you explain why hint + MIN_READ_SIZE instead of something like > >> MAX(len_hint+1, MIN_READ_SIZE) > >> > >> I don't know what MIN_READ_SIZE is from just this patch, but it could > easily be > >> something like 16 Kbyte, while len_hint could be something like 16 for > a file like > >> 'format' that was just statted to obtain the actual size > >> > >> 16 Kbyte + 16 bytes for a 16 bytes file looks a bit large... And > allocating 16 + 32 > >> bytes is far more efficient than allocating that huge chunk at once. > >> > > MIN_READ_SIZE is 64 bytes and it's locally defined in this function. > > We cannot use MAX(len_hint+1, MIN_READ_SIZE) because main loop > > reallocates buffer if don't have MIN_READ_SIZE remaining. The current > > code assumes that MIN_READ_SIZE is small value and I decided to leave > > this code intact. It could be improved of course, but I wanted to make > > minimum change first. > > I don't think that was the right fix. One deficiency with the code is > that it doesn't break out of the loop when end-of-stream is detected, > which can be reported by a 'short' read. Your change doesn't solve > that problem in general, although it does solve it for the case where > the supplied length hint is exactly right. > The short read detection got changed in r1561688. Before that, svn_stream_read would be called for which the short read detection was correct. Since then we have a somewhat inefficient double EOS check. I guess the correct way of doing this is revert Ivan's change and apply something like the attached patch. > We also have the function > > svn_string_from_stream() > > which performs a virtually identical task. These two functions have > completely different implementations. svn_string_from_stream() does > break out of the loop as soon as it detects end-of-stream reported as > a short read. They also use completely different read sizes > (MIN_READ_SIZE = 64 bytes vs. SVN__STREAM_CHUNK_SIZE = 16384). > > That is silly: there is no good reason for that, and it is a waste of > effort maintaining them separately, and discussing separate > optimizations for each. > > We should make those two functions share the same "best" implementation. > Both functions are rarely used within SVN. I'd say, implement a svn_string_from_stream as "read into stringbuf, alloc'ed in scratch_pool + copy to final string". That preserves and actually improves upon it's current memory usage pattern while reusing the stringbuf code. -- Stefan^2.
Index: stream.c =================================================================== --- stream.c (revision 1706628) +++ stream.c (working copy) @@ -1488,7 +1488,7 @@ { #define MIN_READ_SIZE 64 - apr_size_t to_read = 0; + apr_size_t to_read, gotten; svn_stringbuf_t *text = svn_stringbuf_create_ensure(len_hint ? len_hint : MIN_READ_SIZE, result_pool); @@ -1496,13 +1496,17 @@ do { to_read = text->blocksize - 1 - text->len; - SVN_ERR(svn_stream_read_full(stream, text->data + text->len, &to_read)); - text->len += to_read; + if (to_read == 0) + { + svn_stringbuf_ensure(text, text->blocksize * 2); + to_read = text->blocksize - 1 - text->len; + } - if (to_read && text->blocksize < text->len + MIN_READ_SIZE) - svn_stringbuf_ensure(text, text->blocksize * 2); + gotten = to_read; + SVN_ERR(svn_stream_read_full(stream, text->data + text->len, &gotten)); + text->len += gotten; } - while (to_read); + while (gotten == to_read); text->data[text->len] = '\0'; *str = text;