ReneEnjilian opened a new pull request, #2055: URL: https://github.com/apache/systemds/pull/2055
This pull request implements the Scalable Linear Algebra Benchmark (Slab). The benchmark divides the workloads into 3 segments: (1) Matrix operators, (2) Pipelines and Decompositions, (3) Bulk LA-based ML Algorithms. For those different workloads we vary different parameters like number of rows and sparsity. In the original paper, the authors also varied parameters like number of nodes in their cluster. These parallel experiments are executed via Apache Spark. Given the constraints of my setup (only a single CPU), I only executed these experiments via spark on my CPU. To run it via multiple CPUs, one would need to create a Spark cluster manually. I left the experiment output files in this pull request for better reviewing purposes. The output folders can of course be deleted when merging in. ### Potential Bug I noticed that the `tsmm` operator causes error messages in some experiments as can be observed in the output directories of operators/distributed_sparse and mlAlgorithms/distributed. For the first the corresponding file is slabGramMatrixSparse_stats.txt and for the latter slabHeteroscedasticityRobustStandardErrorsDistr_stats.txt. I couldn't figure out so far why the `ArrayIndexOutOfBoundsException` happens there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org