On Fri, 2011-06-03, Daniel Shahaf wrote: > Julian Foad wrote on Fri, Jun 03, 2011 at 09:52:21 +0100: > > Hi Stefan2 and others. > > > > This patch simplifies code in RA-svn/marshal.c by using a single code > > path to read both short and long strings efficiently. Using a single > > code path is beneficial for test coverage. > > > > It is an alternative to r1028352 which merged r985606 and r1028092 from > > the performance branch. > > > > I have run the test suite over RA-svn (with BDB) but haven't done any > > more testing than that. > > Unless you built with SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD set to > something less than 20, this may not have tested the "Large string" code > path (when the loop loops).
Right. I've now tested that way (with threshold = 16 bytes) and revealed a bug: I calculated the 'dest' address to read into before calling stringbuf_ensure, but when the latter re-allocates memory it moves the string so my 'dest' was wrong. Fixed. > > It's just something I noticed while looking at > > that code, it's not solving a problem I've noticed or anything like > > that. By inspection I believe it is more efficient than > > read_long_string() was, and unchanged for short strings, but if you're > > able to do any performance analysis on it, that would be great. > > > > - Julian > > > > > Simplify and optimize RA-svn's string marshalling. > > > > This largely reverts r1028352 which merged r985606 and r1028092 from the > > performance branch. That change used separate functions to handle short and > > long strings. This change instead improves the original code to handle both > > short and long strings efficiently. > > > > Compared with the pre-r1028352 code, this version uses a 1-MB chunk size > > instead of 4-KB and eliminates one memcpy of the data. > > > > * subversion/libsvn_ra_svn/marshal.c > > (read_long_string): Remove. > > (read_string): Handle long strings with the same code as short strings, by > > reading directly into pre-allocated space, but pre-allocate no more than > > a reasonable chunk size at a time. > > --This line, and those below, will be ignored-- > > > > Index: subversion/libsvn_ra_svn/marshal.c > > =================================================================== > > --- subversion/libsvn_ra_svn/marshal.c (revision 1130931) > > +++ subversion/libsvn_ra_svn/marshal.c (working copy) > > @@ -563,36 +563,6 @@ svn_error_t *svn_ra_svn_write_tuple(svn_ > > > > /* --- READING DATA ITEMS --- */ > > > > -/* Read LEN bytes from CONN into a supposedly empty STRINGBUF. > > - * POOL will be used for temporary allocations. */ > > -static svn_error_t * > > -read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool, > > - svn_stringbuf_t *stringbuf, apr_uint64_t len) > > -{ > > - char readbuf[4096]; > > - apr_size_t readbuf_len; > > - > > - /* We can't store strings longer than the maximum size of apr_size_t, > > - * so check for wrapping */ > > - if (((apr_size_t) len) < len) > > Why did you remove this check? I was removing all this relatively new code and it wasn't there before, and I wasn't convinced it was very important. But it would provide a quick exit (instead of filling up all available memory) in some error cases or potential DOS attempts so I'll re-instate it. > > - return svn_error_create(SVN_ERR_RA_SVN_MALFORMED_DATA, NULL, > > - _("String length larger than maximum")); > > - > > - while (len) > > - { > > - readbuf_len = len > sizeof(readbuf) ? sizeof(readbuf) : > > (apr_size_t)len; > > - > > - SVN_ERR(readbuf_read(conn, pool, readbuf, readbuf_len)); > > - /* Read into a stringbuf_t to so we don't allow the sender to > > allocate > > - * an arbitrary amount of memory without actually sending us that > > much > > - * data */ > > - svn_stringbuf_appendbytes(stringbuf, readbuf, readbuf_len); > > - len -= readbuf_len; > > - } > > - > > - return SVN_NO_ERROR; > > -} > > - > > /* Read LEN bytes from CONN into already-allocated structure ITEM. > > * Afterwards, *ITEM is of type 'SVN_RA_SVN_STRING', and its string > > * data is allocated in POOL. */ > > @@ -601,42 +571,34 @@ static svn_error_t *read_string(svn_ra_s > > { [...] > > + svn_stringbuf_ensure(stringbuf, stringbuf->len + readbuf_len); > > Need ->len + ->len + 1, for NUL. (Oddly, this API requires this > off-by-one while _create_ensure() does it for you.) Right; thanks. Fixed. > > + SVN_ERR(readbuf_read(conn, pool, dest, readbuf_len)); > > + > > + stringbuf->len += readbuf_len; > > + stringbuf->data[stringbuf->len] = '\0'; > > + len -= readbuf_len; > > What happens if, at this point, there are less than READBUF_LEN bytes > available? > > (e.g., if the client terminates the connection in the middle of > a string literal.) This part of the behaviour is unchanged from the original code. I don't know what happens, but I imagine readbuf_read() returns an error, perhaps SVN_ERR_RA_SVN_CONNECTION_CLOSED which I see readbuf_input() can do. The revised patch is attached. - Julian
Simplify and optimize RA-svn's string marshalling. This largely reverts r1028352 which merged r985606 and r1028092 from the performance branch. That change used separate functions to handle short and long strings. This change instead improves the original code to handle both short and long strings efficiently. Compared with the pre-r1028352 code, this version uses a 1-MB chunk size instead of 4-KB and eliminates one memcpy of the data. * subversion/libsvn_ra_svn/marshal.c (read_long_string): Remove. (read_string): Handle long strings with the same code as short strings, by reading directly into pre-allocated space, but pre-allocate no more than a reasonable chunk size at a time. --This line, and those below, will be ignored-- Index: subversion/libsvn_ra_svn/marshal.c =================================================================== --- subversion/libsvn_ra_svn/marshal.c (revision 1132610) +++ subversion/libsvn_ra_svn/marshal.c (working copy) @@ -563,14 +563,13 @@ svn_error_t *svn_ra_svn_write_tuple(svn_ /* --- READING DATA ITEMS --- */ -/* Read LEN bytes from CONN into a supposedly empty STRINGBUF. - * POOL will be used for temporary allocations. */ -static svn_error_t * -read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool, - svn_stringbuf_t *stringbuf, apr_uint64_t len) +/* Read LEN bytes from CONN into already-allocated structure ITEM. + * Afterwards, *ITEM is of type 'SVN_RA_SVN_STRING', and its string + * data is allocated in POOL. */ +static svn_error_t *read_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool, + svn_ra_svn_item_t *item, apr_uint64_t len) { - char readbuf[4096]; - apr_size_t readbuf_len; + svn_stringbuf_t *stringbuf; /* We can't store strings longer than the maximum size of apr_size_t, * so check for wrapping */ @@ -578,67 +577,36 @@ read_long_string(svn_ra_svn_conn_t *conn return svn_error_create(SVN_ERR_RA_SVN_MALFORMED_DATA, NULL, _("String length larger than maximum")); + /* Read the string in chunks. The chunk size is large enough to avoid + * re-allocation in typical cases, and small enough to ensure we do not + * pre-allocate an unreasonable amount of memory if (perhaps due to + * network data corruption or a DOS attack), we receive a bogus claim that + * a very long string is going to follow. In that case, we start small + * and wait for all that data to actually show up. This does not fully + * prevent DOS attacks but makes them harder (you have to actually send + * gigabytes of data). */ + stringbuf = svn_stringbuf_create_ensure( + (apr_size_t)(len < SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD + ? len : SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD), + pool); + + /* Read the string data directly into the string structure. + * Do it iteratively, if necessary. */ while (len) { - readbuf_len = len > sizeof(readbuf) ? sizeof(readbuf) : (apr_size_t)len; + apr_size_t readbuf_len + = (apr_size_t)(len < SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD + ? len : SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD); + + svn_stringbuf_ensure(stringbuf, stringbuf->len + readbuf_len + 1); + SVN_ERR(readbuf_read(conn, pool, stringbuf->data + stringbuf->len, + readbuf_len)); - SVN_ERR(readbuf_read(conn, pool, readbuf, readbuf_len)); - /* Read into a stringbuf_t to so we don't allow the sender to allocate - * an arbitrary amount of memory without actually sending us that much - * data */ - svn_stringbuf_appendbytes(stringbuf, readbuf, readbuf_len); + stringbuf->len += readbuf_len; + stringbuf->data[stringbuf->len] = '\0'; len -= readbuf_len; } - return SVN_NO_ERROR; -} - -/* Read LEN bytes from CONN into already-allocated structure ITEM. - * Afterwards, *ITEM is of type 'SVN_RA_SVN_STRING', and its string - * data is allocated in POOL. */ -static svn_error_t *read_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool, - svn_ra_svn_item_t *item, apr_uint64_t len) -{ - svn_stringbuf_t *stringbuf; - - /* We should not use large strings in our protocol. However, we may - * receive a claim that a very long string is going to follow. In that - * case, we start small and wait for all that data to actually show up. - * This does not fully prevent DOS attacs but makes them harder (you - * have to actually send gigabytes of data). - */ - if (len > SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD) - { - /* This string might take a large amount of memory. Don't allocate - * the whole buffer at once, so to prevent OOM issues by corrupted - * network data. - */ - stringbuf = svn_stringbuf_create("", pool); - SVN_ERR(read_long_string(conn, pool, stringbuf, len)); - } - else - { - /* This is a reasonably sized string. So, provide a buffer large - * enough to prevent re-allocation as long as the data transmission - * is not flawed. - */ - stringbuf = svn_stringbuf_create_ensure((apr_size_t)len, pool); - - /* Read the string data directly into the string structure. - * Do it iteratively, if necessary. - */ - while (len) - { - apr_size_t readbuf_len = (apr_size_t)len; - char *dest = stringbuf->data + stringbuf->len; - SVN_ERR(readbuf_read(conn, pool, dest, readbuf_len)); - - stringbuf->len += readbuf_len; - stringbuf->data[stringbuf->len] = '\0'; - len -= readbuf_len; - } - } - /* Return the string properly wrapped into an RA_SVN item. */ item->kind = SVN_RA_SVN_STRING; item->u.string = svn_stringbuf__morph_into_string(stringbuf);