On Thursday, 9 February 2017 at 19:39:49 UTC, Nestor wrote:
On Thursday, 9 February 2017 at 18:34:30 UTC, Era Scarecrow wrote:
On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:
I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome:

import std.stdio;

static immutable char[] QG10Matrix =
  "03175986427092154863420687135917509834266123045978" ~
  "36742095815869720134894536201794386172052581436790";

char checkDigit(string str) {
  char tmpdigit = '0';
foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10];
  return tmpdigit;
}

Well one thing is you can probably reduce them from chars to just bytes, instead of having to subtract you can instead add at the end. Although unless you're working with a VERY large input you won't see a difference.

Actually since you're also multiplying by 10, you can incorporate that in the table too... (although a mixin might be better for the conversion than by hand)


 static immutable char[] QG10Matrix = [
    00,30,10,70,50,90,80,60,40,20,
    70,00,90,20,10,50,40,80,60,30,
    40,20,00,60,80,70,10,30,50,90,
    10,70,50,00,90,80,30,40,20,60,
    60,10,20,30,00,40,50,90,70,80,
    30,60,70,40,20,00,90,50,80,10,
    50,80,60,90,70,20,00,10,30,40,
    80,90,40,50,30,60,20,00,10,70,
    90,40,30,80,60,10,70,20,00,50,
    20,50,80,10,40,30,60,70,90,00];

 char checkDigit(string str) {
   char tmpdigit = 0;
   foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') +
 tmpdigit];
   return (tmpdigit/10) + '0';
 }



OK I changed the approach using a multidimensional array for the matrix so I could ditch arithmetic operations altogether, but curiously after measuring a few thousand runs of both implementations through avgtime, I see no noticeable difference. Why?

import std.stdio;

static immutable ubyte[][] QG10Matrix = [
  [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
  [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
  [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
  [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
  [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
];

static int charToInt(char chr) {
  scope(failure) return -1;
  return cast(int)(chr - '0');
}

ubyte checkDigit(string str) {
  ubyte tmpdigit;
foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)];
  return tmpdigit;
}

enum {
  EXIT_SUCCESS = 0,
  EXIT_FAILURE = 1,
}

int main(string[] args) {
  scope(failure) {
    writeln("Invalid arguments. You must pass a number.");
    return EXIT_FAILURE;
  }
  assert(args.length == 2);
  ubyte digit = checkDigit(args[1]);
  if(digit == 0) writefln("%s is a valid number.", args[1]);
  else {
writefln("%s is not a valid number (but it would be, appending digit %s).",
      args[1], digit);
  }

  return EXIT_SUCCESS;
}

The question is, why do you expect it to be noticably faster? On modern hardware it the optimizations are such that so little a change in code is very hard to link to a difference in running time. If you really want to show one the following code does the benchmark the right way (ie: using a very long input, tens of thousands of runs, and avoiding I/O and load time of the program to compare the bare function implementations):

import std.conv;
import std.stdio;
import std.range;
import std.array;
import std.algorithm;

string testcase;

static immutable char[] QG10MatrixOne =
  "03175986427092154863420687135917509834266123045978" ~
  "36742095815869720134894536201794386172052581436790";

char checkDigitOne(string str) {
  char tmpdigit = 0;
foreach(chr; str) tmpdigit = QG10MatrixOne[(chr - '0') + (tmpdigit - '0') * 10];
  return tmpdigit;
}

void testCheckDigitOne() {
    checkDigitTwo(testcase);
}

static immutable ubyte[][] QG10MatrixTwo = [
  [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
  [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
  [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
  [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
  [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
];

static int charToInt(char chr) {
  scope(failure) return -1;
  return cast(int)(chr - '0');
}

ubyte checkDigitTwo(string str) {
  ubyte tmpdigit;
  foreach(chr; str) {
      tmpdigit = QG10MatrixTwo[tmpdigit][charToInt(chr)];
  }
  return tmpdigit;
}

void testCheckDigitTwo() {
    checkDigitTwo(testcase);
}

void main(string[] args) {
testcase = iota(10).cycle.take(100000).map!(to!string).array.join;

  import std.datetime: benchmark;
benchmark!(testCheckDigitOne, testCheckDigitTwo)(10000).each!writeln;
}


Yet on my machine I have the following times (note that the time itself depends on my hardware, what's important is the difference):

TickDuration(15785255852)
TickDuration(15784826803)

So it seems that the first version is slower than the second one, but by so little that it's hard to link to the actual implementation. If anything it shows the futility of such low-level tricks for meaningless optimization. And by the way note that the benchmark avoid measuring IO because we were interested in the functions. But if we were measuring it it would represent 4 fifth of the running time (on my machine, with numbers as long as the one used in the benchmark). This means even a 20% improvement in checkDigit would actually only represent a 4% improvement of the program.

Reply via email to