On Friday, 5 January 2018 at 13:09:25 UTC, Vino wrote:
Sorry, I'm asking what problem are you solving, what the
program should do, what is its idea. Not what code you have
written.
Hi,
I am trying to implement data dictionary compression, and below
is the function of the program,
Function read:
This function read a csv file which contains 3 column as and
stores the value of each column in an array Col1: Array1
(Ucol1), Col2: Array2 (Ucol2), Col3: Array3(Ucol3) and returns
the data.
CSV file content:
Miller America 23
John India 42
Baker Australia 21
Zsuwalski Japan 45
Baker America 45
Miller India 23
Function Main
This function receives the data from the function read.
Creates an array based of the function return type – (
typeof(read()[i]) Data );
Sorts the data and removes the duplicates and stores the data
in the above array.
Then using “countUntil” function we can accomplish the data
dictionary compression.
Thank you for the explanation, this is a nice little task.
Here's my version of solution. I've used ordinary arrays instead
of std.container.array, since the data itself is in GC'ed heap
anyway.
I used csv file separated by tabs, so told csvReader to use '\t'
for delimiter.
import std.algorithm: countUntil, joiner, sort, uniq, map;
import std.csv: csvReader;
import std.stdio: File, writeln;
import std.typecons: Tuple, tuple;
import std.meta;
import std.array : array;
//we know types of columns, so let's state them once
alias ColumnTypes = AliasSeq!(string, string, int);
alias Arr(T) = T[];
auto readData() {
auto file = File("data.csv", "r");
Tuple!( staticMap!(Arr, ColumnTypes) ) res; // tuple of arrays
foreach (record;
file.byLineCopy.joiner("\n").csvReader!(Tuple!ColumnTypes)('\t'))
foreach(i, T; ColumnTypes)
res[i] ~= record[i]; // here res[i] can have
different types
return res;
}
//compress a single column
auto compress(T)(T[] col) {
T[] vals = sort(col.dup[]).uniq.array;
auto ks = col.map!(v => col.countUntil(v)).array;
return tuple(vals, ks);
}
void main() {
auto columns = readData();
foreach(i, ColT; ColumnTypes) {
// here the data can have different type for different i
auto vk = compress(columns[i]);
writeln(vk[0][]); //output data, you can write files
here
writeln(vk[1][]); //output indices
}
}