Re-implementing asprintf [was Re: Total Hack to the finish]

Amitha Perera Mon, 5 Jul 1999 22:30:59 -0700
On 05 Jul 1999 20:20:10 -0500, Rob Browning wrote:
> It would probably be worth a little time to go look at the liberty
> code and see if we can just snatch their implementation without too
> much rewriting.  My guess is that we can't, but if we could, then we'd
> be bug/feature compatible for the most part, and we wouldn't have to
> do the implementation ourself.

I've looked (briefly) at both the glibc implementation and the
libiberty implementation, and I don't think either is really what is
required here.

The glibc implementation uses a special form of file I/O, where the
"file" is entirely contained in a buffer. Every time the buffer
overflows, the buffer size is increased by 100.

The libiberty implementation parses the format string and makes a
worst case guess for the length of the resulting string. For integers
and such, it guesses 30. For strings, it computes the string
length. For floating point types, it assumes *307* since that is the
largest exponent allowed for an IEEE double. This means that a call to
  asprintf(%s, "%.2f %.2f", 1.0, 2.0);
will return with s pointing a >600 byte block of memory! The
formatting itself is done via a call to vsprintf. A realloc is *not*
done after vsprintf returns!

However asprintf is implemented, it must result in calls to malloc and
realloc every so often, when the buffer runs out. A very good,
general-purpose implementation would reduce the number of allocs by
starting with a good guess of the final length. In the case of
gnucash, the question is whether (1) the guessing code is more
efficient than a malloc followed by a realloc, and (2) is it
sufficient simple that it is bug free.

(2) is especially important since gnucash is still under development:
we do not need more places where bugs could crop up. I use one malloc
and one realloc for (1) because in most cases (in gnucash), an initial
estimate of, say, 100 characters is more than enough. Then, we realloc
to what was actually used. In the general case, more than one malloc
may be necessary, but this case will not happen often.

One other point to keep in mind is the 80/20 (or 90/10, depending on
what you read) rule. I doubt that even doubling the execution time of
asprintf would make any significant difference to the user. (I could
be wrong. Profiling will tell.)

I suggest the following pseudocode for implementing asprintf:
  int
  asprintf( char** s, const char* f, ... )
  {
     Do the va_arg stuff
     char* buf;
     int buf_len = 100;
     int print_len;
     buf = (char*)malloc(buf_len);
     Error handle
     print_len = vsnprintf( buf, buf_len, f, ... );
     while (buf_len <= print_len) {
       buf_len *= 2;
       buf = realloc(buf, buf_len);
       Error handle
       print_len = vsnprintf( buf, buf_len, f, ... );
     }
     s = realloc(buf, print_len+1);
     Error handle
     return print_len;
  }

If we were not to use snprintf, then the libiberty implementation may
be the way to go. However, all the platforms I have ready access to
(Linux, FreeBSD, IRIX 6.x, Solaris 2.x) have snprintf.

To reiterate, the code above would in most cases (in gnucash) do
one malloc, a couple of comparisons, one call to snprintf, and one
realloc. Not too expensive, I think.

> The cheezy solution would be to use snprintf with larger and larger
> malloced input strings, until you found one that would fit the
> string, hopefully using a good initial guess so that you don't have
> to make too many tries.

This is, of course, the solution I've outlined above. However, I have
hopefully managed to argue that the solution is not as cheezy as you'd
imagine. Am I missing something obvious?

Amitha.
----- %< -------------------------------------------- >% ------
The GnuCash / X-Accountant Mailing List
To unsubscribe, send mail to [EMAIL PROTECTED] and
put "unsubscribe gnucash-devel [EMAIL PROTECTED]" in the body
Re-implementing asprintf [was Re: Total Hack to the finish]

Reply via email to