On Wed, Oct 7, 2015 at 1:53 PM, Julian Foad <julianf...@gmail.com> wrote:

> Ivan Zhakov wrote:
> > Bert Huijben wrote:
> >>> URL: http://svn.apache.org/viewvc?rev=1707196&view=rev
> >>> Log:
> >>> Slightly optimize svn_stringbuf_from_stream() to avoid allocating twice
> >>> more memory and unnecessary memcpy() when LEN_HINT is equal to final
> >>> stringbuf
> >>> length.
> >>>
> >>> * subversion/libsvn_subr/stream.c
> >>>   (svn_stringbuf_from_stream): Always preallocate LEN_HINT +
> >>> MIN_READ_SIZE
> >>>    bytes to be able perform final read without stringbuf reallocation.
> >>
> >> Can you explain why hint + MIN_READ_SIZE instead of something like
> >> MAX(len_hint+1, MIN_READ_SIZE)
> >>
> >> I don't know what MIN_READ_SIZE is from just this patch, but it could
> easily be
> >> something like 16 Kbyte, while len_hint could be something like 16 for
> a file like
> >> 'format' that was just statted to obtain the actual size
> >>
> >> 16 Kbyte + 16 bytes for a 16 bytes file looks a bit large... And
> allocating 16 + 32
> >> bytes is far more efficient than allocating that huge chunk at once.
> >>
> > MIN_READ_SIZE is 64 bytes and it's locally defined in this function.
> > We cannot use MAX(len_hint+1, MIN_READ_SIZE) because main loop
> > reallocates buffer if don't have MIN_READ_SIZE remaining. The current
> > code assumes that MIN_READ_SIZE is small value and I decided to leave
> > this code intact. It could be improved of course, but I wanted to make
> > minimum change first.
>
> I don't think that was the right fix. One deficiency with the code is
> that it doesn't break out of the loop when end-of-stream is detected,
> which can be reported by a 'short' read. Your change doesn't solve
> that problem in general, although it does solve it for the case where
> the supplied length hint is exactly right.
>

The short read detection got changed in r1561688.
Before that, svn_stream_read would be called for
which the short read detection was correct. Since
then we have a somewhat inefficient double EOS
check.

I guess the correct way of doing this is revert Ivan's
change and apply something like the attached patch.


> We also have the function
>
>   svn_string_from_stream()
>
> which performs a virtually identical task. These two functions have
> completely different implementations. svn_string_from_stream() does
> break out of the loop as soon as it detects end-of-stream reported as
> a short read. They also use completely different read sizes
> (MIN_READ_SIZE = 64 bytes vs. SVN__STREAM_CHUNK_SIZE = 16384).
>
> That is silly: there is no good reason for that, and it is a waste of
> effort maintaining them separately, and discussing separate
> optimizations for each.
>
> We should make those two functions share the same "best" implementation.
>

Both functions are rarely used within SVN. I'd say,
implement a svn_string_from_stream as "read into
stringbuf, alloc'ed in scratch_pool + copy to final string".
That preserves and actually improves upon it's current
memory usage pattern while reusing the stringbuf code.

-- Stefan^2.
Index: stream.c
===================================================================
--- stream.c	(revision 1706628)
+++ stream.c	(working copy)
@@ -1488,7 +1488,7 @@
 {
 #define MIN_READ_SIZE 64
 
-  apr_size_t to_read = 0;
+  apr_size_t to_read, gotten;
   svn_stringbuf_t *text
     = svn_stringbuf_create_ensure(len_hint ? len_hint : MIN_READ_SIZE,
                                   result_pool);
@@ -1496,13 +1496,17 @@
   do
     {
       to_read = text->blocksize - 1 - text->len;
-      SVN_ERR(svn_stream_read_full(stream, text->data + text->len, &to_read));
-      text->len += to_read;
+      if (to_read == 0)
+        {
+          svn_stringbuf_ensure(text, text->blocksize * 2);
+          to_read = text->blocksize - 1 - text->len;
+        }
 
-      if (to_read && text->blocksize < text->len + MIN_READ_SIZE)
-        svn_stringbuf_ensure(text, text->blocksize * 2);
+      gotten = to_read;
+      SVN_ERR(svn_stream_read_full(stream, text->data + text->len, &gotten));
+      text->len += gotten;
     }
-  while (to_read);
+  while (gotten == to_read);
 
   text->data[text->len] = '\0';
   *str = text;

Reply via email to