On Wed, Jun 15, 2011 at 03:28:16PM +0300, Timo Teräs wrote:
> On 06/15/2011 03:14 PM, Rich Felker wrote:
> > On Wed, Jun 15, 2011 at 01:49:30PM +0300, Timo Teräs wrote:
> >> Hi all,
> >>
> >> As recently discussed on the 'Significant performance problems with
> >> modprobe', the config_* and xmalloc_fget* API has some performance issues:
> >> 1. It uses locking getc
> >> 2. It does malloc/realloc/free for each line read
> >>
> >> Now (1) is relatively simply fixable with getdelim(), getline() or
> >> getc_unlocked(). And even get_line_from_file.c has comments that fixing
> >> of (1) should about double the speed.
> > 
> > The difference between getc_unlocked and getc should be fairly small.
> > If getc_unlocked is a macro, probably at best 2.5-3x, and if it's a
> > function, probably at best 1.5-2x. getline/getdelim would make a much
> > bigger difference.
> 
> What makes you think so?
> 
> E.g. uclibc getline and getdelim implementations internally use
> __GETC_UNLOCKED macro, which is what you would get with getc_unlocked() too.

I'm assuming a non-crap implementation where getdelim performs memchr
in the buffer. Of course if you implement getdelim in terms of slow
primitives it will be slow. Both glibc and my implementation in musl
are much faster than repeatedly calling getc_unlocked.

> Actually getc_unlocked is likely faster, because getdelim/getline still
> do regular locking once for each line read. And there just simple isn't
> _unlocked variants of them.
> 
> We should be really using _unlocked here anyway if available. Since the
> parsing api is not intended to be multithreaded anyway, where as regular
> stdio is.

Locking should not be performed unless there's more than one thread
anyway. The only cost of the locked variants is a single conditional
to determine that locking does not need to be performed, and the fact
that the single-character locked functions (getc etc.) won't be
inlined. But for larger operations like getdelim, fread, etc. the
difference is not measurable whatsoever. That's probably why the
standard doesn't bother including ugly _unlocked variants for
everything.

> >> However, (2) is trickier, because the whole API is designed exactly for
> > 
> > I'm not convinced that (2) has any bearing whatsoever on performance
> > unless you're using some hideous malloc implementation that calls mmap
> > for each allocation. Most calls to malloc (when new brk/mmap isn't
> > required) should cost about the same as most calls to getc (when
> > buffer isn't empty).
> 
> This depends very much on your malloc implementation. E.g. on uclibc
> calling malloc guarantees that you do at least one mutex lock/unlock
> operation. And yes, it will give you performance penalty similar to not
> using the _unlocked variants.

Then this should be fixed in uclibc. There's no reason for the mutex
lock/unlock to be called at all if no threads have been created.

> If you want to optimise things, let's do it properly.

That's what I want to see done. "Let's assume everything that
should(*) be fast is slow and everything that should be slow is fast"
is not what I'd call "properly".

Rich


(*) By "should", I mean on a basis of how fast the interface allows it
to be on an implementation that takes advantage of the interface to
optimize rather than just doing the dumbest possible implementation.
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to