Rounin: > Thank you for that advice. I'm using GDC because it's available from Ubuntu > Linux's package system, whereas DMD currently is not. (And the .deb posted on > digitalmars.com is only for i386.) Hopefully, D will gain more popularity and > more > up-to-date tools will be made available.
On a 32 bit system I find LDC to produce efficient programs, when possible (the GC is similar, in D2 it's essentially the same). > By the way, today I re-compiled the program with a "std.gc.enable;" right > before > the final "return 0" statement, and it still runs in 0.68 seconds. You may try to disable/enable the GC in the Python code too (on 32 bit systems there's the very good Psyco too). Your benchmark is not portable, so I can't help you find where the performance problem is. When you perform a benchmark it's better to give all source code and all data too, to allow others to reproduce the results and look for performance problems. Keep in mind that D associative arrays are usually slower than Python dicts. Probably you build data structures like associative arrays, and this slows down the GC. If you disable&enable the GC around that build phase, the program is probably fast (so I suggest you to narrow as much as possible the width of the disable/enable span, so you may see where the GC problem is). If you put a exit(0) at the end of the program (to kill final collection) the D program may save more time. In Python 2.7 they have added a GC optimization that I may be used in D too: http://bugs.python.org/issue4074 >The garbage collector now performs better for one common usage pattern: when >many objects are being allocated without deallocating any of them. This would >previously take quadratic time for garbage collection, but now the number of >full garbage collections is reduced as the number of objects on the heap >grows. The new logic only performs a full garbage collection pass when the >middle generation has been collected 10 times and when the number of survivor >objects from the middle generation exceeds 10% of the number of objects in the >oldest generation. (Suggested by Martin von Löwis and implemented by Antoine >Pitrou; issue 4074.)< >while D's splitlines() wasn't quite working.< >int space = std.regexp.find(line, r"\s");< What do you mean? If there's a bug in splitlines() or split() it's better to add it to Bugzilla, possibly with inlined string to split (no external file to read). splitlines() or split() are simple functions of a module, written in D, so if there's a problem it's usually not too much hard to fix it, they are not built-in methods written in C as in CPython. if(!(path in oldpaths) && !(checksum in oldsums)) In D2 this may be written (unfortunately there is no "and" keyword, Walter doesn't get its usefulness yet): if (path !in oldpaths && checksum !in oldsums) Bye, bearophile
