Repository: systemml Updated Branches: refs/heads/master 9de00dbb2 -> 87a0c0bd4
[MINOR] Cleanup Kmeans algorithm script (table padding/truncating) This patch simplifies the kmeans script by folding separate padding and truncating of table outputs into the table operation itself. This also improved performance of Kmeans over Mnist80m w/ 20 iterations, 5 centroids and codegen enabled from 581s to 341s (2,164s w/o codegen). Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/87a0c0bd Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/87a0c0bd Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/87a0c0bd Branch: refs/heads/master Commit: 87a0c0bd485ee5255412b24c9230e46ae7de71ad Parents: 9de00db Author: Matthias Boehm <[email protected]> Authored: Tue Jun 19 17:49:19 2018 -0700 Committer: Matthias Boehm <[email protected]> Committed: Tue Jun 19 17:49:19 2018 -0700 ---------------------------------------------------------------------- scripts/algorithms/Kmeans.dml | 21 +++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/87a0c0bd/scripts/algorithms/Kmeans.dml ---------------------------------------------------------------------- diff --git a/scripts/algorithms/Kmeans.dml b/scripts/algorithms/Kmeans.dml index 54bff85..db6a92f 100644 --- a/scripts/algorithms/Kmeans.dml +++ b/scripts/algorithms/Kmeans.dml @@ -102,17 +102,13 @@ for (i in 1 : num_centroids) centroid_ids = t(colSums (cdf_min_distances < threshold_matrix)) + 1; # Place the selected centroids together, one per run, into a matrix: - centroid_placer = matrix (0, rows = num_runs, cols = (sample_block_size * num_runs)); - centroid_placer_raw = - table (seq (1, num_runs, 1), sample_block_size * seq (0, num_runs - 1, 1) + centroid_ids); - centroid_placer [, 1 : ncol (centroid_placer_raw)] = centroid_placer_raw; + centroid_placer = table (seq (1, num_runs), + sample_block_size * seq (0, num_runs - 1) + centroid_ids, num_runs, sample_block_size * num_runs); centroids = centroid_placer %*% X_samples; # Place the selected centroids into their appropriate slots in All_Centroids: - centroid_placer = matrix (0, rows = nrow (All_Centroids), cols = num_runs); - centroid_placer_raw = - table (seq (i, num_centroids * (num_runs - 1) + i, num_centroids), seq (1, num_runs, 1)); - centroid_placer [1 : nrow (centroid_placer_raw), ] = centroid_placer_raw; + centroid_placer = table (seq (i, num_centroids * (num_runs - 1) + i, num_centroids), + seq (1, num_runs, 1), nrow (All_Centroids), num_runs); All_Centroids = All_Centroids + centroid_placer %*% centroids; # Update min_distances to preserve the loop invariant: @@ -250,14 +246,7 @@ get_sample_maps = function (int num_records, int num_samples, int approx_sample_ # Use contingency table to create the "sample_maps" matrix that is a vertical concatenation # of 0-1-matrices, one per sample, each with 1s at (i, sample_record[i]) and 0s elsewhere: - sample_maps_raw = table (seq (1, num_rows), sample_rec_ids); - max_rec_id = ncol (sample_maps_raw); - if (max_rec_id >= num_records) { - sample_maps = sample_maps_raw [, 1 : num_records]; - } else { - sample_maps = matrix (0, rows = num_rows, cols = num_records); - sample_maps [, 1 : max_rec_id] = sample_maps_raw; - } + sample_maps = table (seq (1, num_rows), sample_rec_ids, num_rows, num_records); # Create a 0-1-matrix that maps each sample column ID into all row positions of the # corresponding sample; map out-of-sample-range positions to row id = num_rows + 1:
