Hi all, meanwhile our new code generation feature is sufficiently stable to enter a broader testing with the goal to further improve its capabilities. If you're interesting, you can enable this feature via
<codegen.enabled>true</codegen.enabled> in your SystemML-config.xml file. The major advantages are fewer intermediates (read and write, incl. potentially fewer evictions), fewer scans of inputs and intermediates, and better sparsity exploitation across chains of operations. On our mainstream algorithms, we see significant improvements compared to existing fused operators for scenarios with few features, i.e., when the vector and matrix intermediates become the bottleneck, or scripts with missing sparsity-exploiting operations. For example, on a 100M x 10 (8GB) scenario of L2SVM w/ 20 outer iterations, codegen improves performance from 219s (496s without hand-coded fused operators) to 32s. So please bring your favorite expressions. If you have interesting scripts, please give it a try and share any issues or patterns that we're currently not handling very well. Thanks. Regards, Matthias