Rounin:

> Thank you for that advice. I'm using GDC because it's available from Ubuntu
> Linux's package system, whereas DMD currently is not. (And the .deb posted on
> digitalmars.com is only for i386.) Hopefully, D will gain more popularity and 
> more
> up-to-date tools will be made available.

On a 32 bit system I find LDC to produce efficient programs, when possible (the 
GC is similar, in D2 it's essentially the same).


> By the way, today I re-compiled the program with a "std.gc.enable;" right 
> before
> the final "return 0" statement, and it still runs in 0.68 seconds.

You may try to disable/enable the GC in the Python code too (on 32 bit systems 
there's the very good Psyco too).

Your benchmark is not portable, so I can't help you find where the performance 
problem is. When you perform a benchmark it's better to give all source code 
and all data too, to allow others to reproduce the results and look for 
performance problems.

Keep in mind that D associative arrays are usually slower than Python dicts. 
Probably you build data structures like associative arrays, and this slows down 
the GC. If you disable&enable the GC around that build phase, the program is 
probably fast (so I suggest you to narrow as much as possible the width of the 
disable/enable span, so you may see where the GC problem is). If you put a 
exit(0) at the end of the program (to kill final collection) the D program may 
save more time.

In Python 2.7 they have added a GC optimization that I may be used in D too:
http://bugs.python.org/issue4074
>The garbage collector now performs better for one common usage pattern: when 
>many objects are being allocated without deallocating any of them. This would 
>previously take quadratic time for garbage collection, but now the number of 
>full garbage collections is reduced as the number of objects on the heap 
>grows. The new logic only performs a full garbage collection pass when the 
>middle generation has been collected 10 times and when the number of survivor 
>objects from the middle generation exceeds 10% of the number of objects in the 
>oldest generation. (Suggested by Martin von Löwis and implemented by Antoine 
>Pitrou; issue 4074.)<


>while D's splitlines() wasn't quite working.<
>int space = std.regexp.find(line, r"\s");<

What do you mean? If there's a bug in splitlines() or split() it's better to 
add it to Bugzilla, possibly with inlined string to split (no external file to 
read). splitlines() or split() are simple functions of a module, written in D, 
so if there's a problem it's usually not too much hard to fix it, they are not 
built-in methods written in C as in CPython.


if(!(path in oldpaths) && !(checksum in oldsums))
In D2 this may be written (unfortunately there is no "and" keyword, Walter 
doesn't get its usefulness yet):
if (path !in oldpaths && checksum !in oldsums)

Bye,
bearophile

Reply via email to