ilovemesomeramen opened a new pull request #1261: URL: https://github.com/apache/systemds/pull/1261
This PR adds basic Multithreading capability to the transform encode implementation. Each ColumnEncoder can be executed on a separate thread or can be split up into even smaller subjob which only apply to a certain row range Initial benchmarks with 16CPUs show up to a 50x speed improvement in comparison to the old SystemML implementation. Currently this code is dormant, which means a call to `transformencode` in a DML script still uses a single threaded implementation. This will be changed when further improvements and testing are complete. Large Matrices (e.g. 1000000x1000) are still not viable due to suspected Thread starving. This will be addressed in a future PR with some sort of access partitioning (Radix/Range). This PR also brings back sparse support for large dummycoded matrices, which was accidentally removed in a prior PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
