I did a performance study on speeding up case conversion in std.uni.asLowerCase. Specifics for asLowerCase have been added to issue https://issues.dlang.org/show_bug.cgi?id=11229. Publishing here as some of the more general observations may be of wider interest.

Background - Case conversion can generally be sped up by checking if a character is ascii before invoking a full unicode case conversion. The single character std.uni.toLower does this optimization, but std.uni.asLowerCase does not. asLowerCase does a lazy conversion of a range. For the test, I created a replacement for asLowerCase which uses map and toLower. In essence, `map!(x => x.toLower)` or `map!(x => x.byDchar.toLower)`.

Testing was with DMD (2.071) and LDC 1.0.0-beta1 (Phobos 2.070) on OSX. Compiler settings were `-release -O -boundscheck=off`. DMD was tested with and without `-inline`. LDC turns on inlining (-enable-inlining=1) by default with -O, but DMD does not. Texts tried were in Japanese, Chinese, Finnish, English, German, and Spanish. Timing was done both including and excluding decoding from utf-8 to dchar.

Performance delta including decoding to dchar:
| Language group | Pct Ascii | LDC gain | DMD gain | DMD no inline | |-----------------+-----------+------------+-----------+----------------| | Latin | 95-99% | 64% (2.7x) | 93% (14x) | 48% (1.9x) |
  | Asian (Jpn/Chn) |  2.4-3.7% | 36% (1.6x) | 80% (5x)  | -1%

Performance delta excluding decoding to dchar:
| Language group | Pct Ascii | LDC gain | DMD gain | DMD no inline | |-----------------+-----------+------------+-----------+---------------| | Latin | 95-99% | 60% (2.5x) | 95% (20x) | 60% (2.5x) |
  | Asian (Jpn/Chn) |  2.4-3.7% | 50% (2x)   | 95% (20x) | -2%

Observations:
* mapAsLowerCase was faster than asLowerCase across the board. That it was better for Asian texts suggests the improvement involved more just the ascii check optimization. * Performance varied widely between compilers, and for DMD, whether the -inline flag was included. The performance delta between asLowerCase and the mapAsLowerCase replacement was very dependent on these choices. Similarly, the delta between inclusion and exclusion of auto-decoding was highly dependent on these selections. * DMD improvement by using -inline: 30% for asLowerCase (1.5x), 90% for mapAsLowerCase (10x). * DMD (-inline) vs LDC: For asLowerCase, LDC was 65-85% faster. For mapAsLowerCase, DMD was 10-40% faster. There were changes to the map implementation in 2.071, so these were not equivalent, but still, it's interesting that DMD beat LDC in this case.

Thoughts:
* The large variances between compiler settings imply extra diligence when performance tuning at the source code level, especially for code intended for multiple compilers. * Perhaps DMD -O should also turn on -inline. This would present a better performance picture to new users. It's also helpful when the different compilers agree on rough meaning of compiler switches. * Auto-decoding is an oft discussed concern. It doesn't show up in the table above, but the data I looked at suggests the cost/penalty may vary quite a bit depending on usage context and compiler/settings. I wasn't studying aspect explicitly. It may be worth its own analysis.

Other details:
* Code for mapAsLowerCase and the timing program is at: https://dpaste.dzfl.pl/a0e2fa1c71fd * Texts used for timing were books in several languages from the Project Gutenberg site (http://www.gutenberg.org/), with boilerplate text removed.

--Jon

Reply via email to