On Tuesday, 8 August 2017 at 16:00:17 UTC, Steven Schveighoffer wrote:
On 8/8/17 11:28 AM, Guillaume Chatelet wrote:
Let's say I'm processing MB of data, I'm lazily iterating over the incoming lines storing data in an associative array. I don't want to copy unless I have to.

Contrived example follows:

input file
----------
a,b,15
c,d,12
....

Efficient ingestion
-------------------
void main() {

   size_t[string][string] indexed_map;

   foreach(char[] line ; stdin.byLine) {
     char[] a;
     char[] b;
     size_t value;
     line.formattedRead!"%s,%s,%d"(a,b,value);

     auto pA = a in indexed_map;
     if(pA is null) {
       pA = &(indexed_map[a.idup] = (size_t[string]).init);
     }

     auto pB = b in (*pA);
     if(pB is null) {
       pB = &((*pA)[b.idup] = size_t.init
     }

// Technically unneeded but let's say we have more than 2 dimensions.
     (*pB) = value;
   }

   indexed_map.writeln;
}


I qualify this code as ugly but fast. Any idea on how to make this less ugly? Is there something in Phobos to help?

I wouldn't use formattedRead, as I think this is going to allocate temporaries for a and b.

Note, this is very close to Jon Degenhardt's blog post in May: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/

-Steve

I haven't yet dug into formattedRead but thx for letting me know : ) I was mostly speaking about the pattern with the AA. I guess the best I can do is a templated function to hide the ugliness.


ref Value GetWithDefault(Value)(ref Value[string] map, const (char[]) key) {
  auto pValue = key in map;
  if(pValue) return *pValue;
  return map[key.idup] = Value.init;
}

void main() {

  size_t[string][string] indexed_map;

  foreach(char[] line ; stdin.byLine) {
    char[] a;
    char[] b;
    size_t value;
    line.formattedRead!"%s,%s,%d"(a,b,value);

    indexed_map.GetWithDefault(a).GetWithDefault(b) = value;
  }

  indexed_map.writeln;
}


Not too bad actually !

Reply via email to