On 8/8/17 11:28 AM, Guillaume Chatelet wrote:
Let's say I'm processing MB of data, I'm lazily iterating over the
incoming lines storing data in an associative array. I don't want to
copy unless I have to.
Contrived example follows:
input file
----------
a,b,15
c,d,12
....
Efficient ingestion
-------------------
void main() {
size_t[string][string] indexed_map;
foreach(char[] line ; stdin.byLine) {
char[] a;
char[] b;
size_t value;
line.formattedRead!"%s,%s,%d"(a,b,value);
auto pA = a in indexed_map;
if(pA is null) {
pA = &(indexed_map[a.idup] = (size_t[string]).init);
}
auto pB = b in (*pA);
if(pB is null) {
pB = &((*pA)[b.idup] = size_t.init
}
// Technically unneeded but let's say we have more than 2 dimensions.
(*pB) = value;
}
indexed_map.writeln;
}
I qualify this code as ugly but fast. Any idea on how to make this less
ugly? Is there something in Phobos to help?
I wouldn't use formattedRead, as I think this is going to allocate
temporaries for a and b.
Note, this is very close to Jon Degenhardt's blog post in May:
https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
-Steve