ilovemesomeramen opened a new pull request #1261:
URL: https://github.com/apache/systemds/pull/1261


   This PR adds basic Multithreading capability to the transform encode 
implementation.
   Each ColumnEncoder  can be executed on a separate thread or can be split up 
into even smaller subjob which only apply to a certain row range
   Initial benchmarks with 16CPUs show up to a 50x speed improvement in 
comparison to the old SystemML implementation.
   Currently this code is dormant, which means a call to `transformencode` in a 
DML script still uses a single threaded implementation. This will be changed 
when further improvements and testing are complete.
   Large Matrices (e.g. 1000000x1000) are still not viable due to suspected 
Thread starving. This will be addressed in a future PR with some sort of access 
partitioning (Radix/Range).
   
   This PR also brings back sparse support for large dummycoded matrices, which 
was accidentally removed in a prior PR
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to