BACtaki opened a new pull request, #1650: URL: https://github.com/apache/systemds/pull/1650
JIRA: https://issues.apache.org/jira/browse/SYSTEMDS-3390 This patch improves the performance of countDistinctApprox() row/col aggregation by replacing matrix slicing with direct ops on the input matrix. This has the most impact in CP execution mode given the smaller input size (max 1000x1000); some simple experiments demonstrate this: (numbers represent average over 3 runs) 1. row aggregation (A) dense: 10000x1000 with sparsity=0.9 1.198s with slicing, 0.874s without slicing - a 27% improvement (B) sparse: 10000x1000 with sparsity=0.1 0.528s with slicing, 0.512s without slicing - a 3% improvement As expected, the larger and the more dense the input matrix, the larger the performance improvement. 2. col aggregation (A) dense: 10000x1000 with sparsity=0.9 1.186s with slicing, 1.036s without slicing - a 13% improvement (B) sparse: 10000x1000 with sparsity=0.1 1.272s with slicing, 0.647s without slicing - a 49% improvement In this case, the sparser the input matrix, the larger the performance improvement. This phenomenon is a result of employing a hash map M in the implementation: as the RxC input matrix becomes denser, M's keyset size approaches C, and the performance approaches the baseline, which uses slicing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org