Well, there's some progress.  I have a cygwin build, and it's every
bit at slow as Vivian says it is.  And I've kinda/sorta isolated the
problem.  Here's a program:

   // g++ -o aptdat aptdat.cc -I$FG/SimGear -L$FG/lib -lsgmisc -lz
   #include <simgear/misc/sgstream.hxx>
   #include <simgear/misc/strutils.hxx>
   const int bufsz = 2048;
   char buf[bufsz];
   int main(int argc, char** argv)
       sg_gzifstream f(argv[1]);
       vector<string> toks;
       while (!f.eof()) {
        f.getline(buf, bufsz);
        toks = simgear::strutils::split(buf);

This basically duplicates the read loop in the apt.dat.gz loader.  It
runs in 2 seconds on my linux box, but takes 40 (!) seconds on the
windows/cygwin machine.  Bizarre.

Even weirder: comment out the split() call, which is 100% CPU-bound
(let me say that again: split() does no I/O and makes no direct calls
to the OS kernel), and the runtime goes down to 1.3 seconds.  Huh?

Here's one more piece of evidence: you can watch this program in
strace, which has a nice microseconds-since-the-last-call field.  It
basically opens up the file and starts reading 1024 bytes at a time.
Most of these calls take about 10-100us per 1k chunk.  But every 10 or
so chunks (and at very regular intervals -- keep this in mind, it's
important), one of the readv() calls takes something like 8-10
*thousand* microseconds

Note to OS junkies: Quick, before reading further, do you seen
anything special about time values in that range?

So here's the hypothesis: the split routine does no I/O, but it *does*
need to allocate memory.  The heap needs to be threadsafe, obviously,
so some heap operations need to be synchronized.  What if after every
N allocations (which are, in this context, very regular with respect
to the number of bytes read from the file), the heap needs to do
something that requires a lock.  And what if the synchronization
mechanism requires a spinlock or timeout?  (Which is sadly necessary
in windows, as the win32 API has no condition variables).  Well, then
the process will need to wait for the next CPU timeslice to run.  And
the CPU timeslice on WinXP just happens to be (I think) 100 Hz.  Hm...

The bottom line is that this is a cygwin bug and we can't fix it,
sorry.  Hopefully someone with more love for this platform than I
(I've done my time, heh) can forward this to them and get it fixed.
It would need to eliminate the (red herring) gzip ifstream interface
in favor of plain old fgets(), obviously, and should include the split
code inline.  But the performance problem is clear as day: it's 20x
slower than exactly the same program run under linux.

I have to wonder how many other "cygwin is slow!" reports (not just
FlightGear's) are related to this issue...


Flightgear-devel mailing list

Reply via email to