Hi Achim,

Achim Gratz wrote:
I've been experimenting a bit with ZStandard dictionaries.  The
dictionary builder is probably not the most optimized piece of software

Is this what leads you to suspect malloc?  Really heavy use of malloc?

and if you feed it large amounts of data it needs quite a lot of
cycles.  So I thought I run some of this on Cygwin since that machine is
faster and has more threads than my Linux box.  Unfortunately that plan
shattered due to extreme slowness of the first (single-threaded) part of
the dictionary builder that sets up the partial suffix array.

|------+---------------+---------------|
|      | E3-1225v3     | E3-1276v3     |
|      | 4C/4T         | 4C/8T         |
|      | 3.2/3.6GHz    | 3.6/4.0GHz    |
|------+---------------+---------------|
|  100 | 00:14 /   55s | 00:23 /  126s |
|  200 | 00:39 /  145s | 01:10 /  241s |
|  400 | 01:12 /  266s | 01:25 /  322s |
|  800 | 02:06 /  466s | 11:12 / 1245s |
| 1600 | 03:57 /  872s | > 2hr         |
| 3200 | 08:03 / 1756s | n/a           |
| 6400 | 16:17 / 3581s | n/a           |
|------+---------------+---------------|

The obvious difference is that I/O takes a lot longer on Cygwin (roughly
a minute for reading all the data) and that I have an insane amount of
page faults on Windows (as reported by time) vs. none on Linux.

How much RAM does the Windows machine have? Do you have a paging file? Is it fixed size or "let Windows manage"? How big is it?

While doing that I also noticed that top shows the program taking 100%
CPU in the multithreaded portion of the program, while it should show
close to 800% at that time.  I'm not sure if that information just isn't
available on Windows or if procps-ng needs to look someplace else for
that to be shown as expected.

No offense, but are you sure it's actually running multi-threaded on Windows?

I have a Cygwin malloc speedup patch that *might* help the m-t part. I'll prepare and submit that to cygwin-patches shortly.
Cheers,

..mark

Reply via email to