Baunsgaard opened a new pull request #1196: URL: https://github.com/apache/systemds/pull/1196
This commit adds an mapping abstract and some implementations of this to allow DDC to use an arbitrary number of bits per entry. the numbers currently supported are 1 8 16 and 32 While there is added a student task to make others. Furthermore this abstract is also used in SDC with similar benefits for better compression. Also fixed in this commit various bugs, and improved compression ratio to 23x on census from 15x. this gives a PCA execution time 1.4x faster including IO and compression for both census and classic MNIST. This commit also contain the beginning of a insertion sorter for efficient construction of SDC column groups, but currently only a naive implementation is added that works well for few unique values, but does not scale well to larger amounts. Normal:  CLA:  ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
