On Tuesday, 8 August 2017 at 16:00:17 UTC, Steven Schveighoffer
wrote:
On 8/8/17 11:28 AM, Guillaume Chatelet wrote:
Let's say I'm processing MB of data, I'm lazily iterating over
the incoming lines storing data in an associative array. I
don't want to copy unless I have to.
Contrived example follows:
input file
----------
a,b,15
c,d,12
....
Efficient ingestion
-------------------
void main() {
size_t[string][string] indexed_map;
foreach(char[] line ; stdin.byLine) {
char[] a;
char[] b;
size_t value;
line.formattedRead!"%s,%s,%d"(a,b,value);
auto pA = a in indexed_map;
if(pA is null) {
pA = &(indexed_map[a.idup] = (size_t[string]).init);
}
auto pB = b in (*pA);
if(pB is null) {
pB = &((*pA)[b.idup] = size_t.init
}
// Technically unneeded but let's say we have more than 2
dimensions.
(*pB) = value;
}
indexed_map.writeln;
}
I qualify this code as ugly but fast. Any idea on how to make
this less ugly? Is there something in Phobos to help?
I wouldn't use formattedRead, as I think this is going to
allocate temporaries for a and b.
Note, this is very close to Jon Degenhardt's blog post in May:
https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
-Steve
I haven't yet dug into formattedRead but thx for letting me know
: )
I was mostly speaking about the pattern with the AA. I guess the
best I can do is a templated function to hide the ugliness.
ref Value GetWithDefault(Value)(ref Value[string] map, const
(char[]) key) {
auto pValue = key in map;
if(pValue) return *pValue;
return map[key.idup] = Value.init;
}
void main() {
size_t[string][string] indexed_map;
foreach(char[] line ; stdin.byLine) {
char[] a;
char[] b;
size_t value;
line.formattedRead!"%s,%s,%d"(a,b,value);
indexed_map.GetWithDefault(a).GetWithDefault(b) = value;
}
indexed_map.writeln;
}
Not too bad actually !