Baunsgaard commented on pull request #931: URL: https://github.com/apache/systemds/pull/931#issuecomment-658081701
@mboehm7 As requested here are some comparison between before and now after, also with this i will finish committing to this branch, to enable reviews. I have disabled two key features, that hopefully will improve performance once re-implemented, but i intend to slightly change the way they are done. - Dictionary sharing (I intend to enable sharing across different col-group types, Since we now have a shared representation for this) I intend to move this step to before the construction of ColGroups, this will enable the storing of pointers to all the dictionaries in the CompressedMatrixBlock object to quicken value only computations, and the ColGroups will then be oblivious to their sharing of dictionaries. - CoCoding. This is disabled currently since 1 it increase compression time, 2 it does not improve compression ratio on covType dataset. Before (on master branch) ```code DATA , RUN , TYPE , TIME ms , REP covtype , MatrixVector mv , cla , 1.980 , 100 covtype , MatrixVector vm , cla , 3.310 , 100 covtype , scalar mult , cla , 3.900 , 100 covtype , scalar plus , cla , 13.180 , 100 covtype , unaryAggregate sum , cla , 1.992 , 500 covtype , unaryAggregate rowsum , cla , 23.740 , 500 covtype , unaryAggregate colsum , cla , 24.556 , 500 covtype , unaryAggregate colmax , cla , 0.122 , 500 covtype , unaryAggregate max , cla , nan , 0 covtype , unaryAggregate min , cla , 0.100 , 500 covtype , unaryAggregate rowmax , cla , 44.208 , 500 ``` after: ```code DATA , RUN , TYPE , TIME ms , REP covtype , MatrixVector mv , cla , 1.916 , 1000 covtype , MatrixVector mv , lcla , 1.752 , 1000 covtype , MatrixVector vm , cla , 4.138 , 1000 covtype , MatrixVector vm , lcla , 3.764 , 1000 covtype , scalar mult , cla , 0.157 , 1000 covtype , scalar mult , lcla , 0.129 , 1000 covtype , scalar plus , cla , 0.249 , 1000 covtype , scalar plus , lcla , 0.212 , 1000 covtype , unaryAggregate sum , cla , 0.828 , 500 covtype , unaryAggregate sum , lcla , 2.790 , 500 covtype , unaryAggregate rowsum , cla , 12.075 , 3000 covtype , unaryAggregate rowsum , lcla , 33.120 , 3000 covtype , unaryAggregate colsum , cla , 0.834 , 500 covtype , unaryAggregate colsum , lcla , 2.886 , 500 covtype , unaryAggregate colmax , cla , 0.259 , 3000 covtype , unaryAggregate colmax , lcla , 0.039 , 3000 covtype , unaryAggregate max , cla , 0.142 , 500 covtype , unaryAggregate max , lcla , 0.064 , 500 covtype , unaryAggregate min , cla , 0.170 , 500 covtype , unaryAggregate min , lcla , 0.118 , 500 covtype , unaryAggregate rowmax , cla , 31.253 , 3000 covtype , unaryAggregate rowmax , lcla , 69.297 , 3000 ``` Uncompressed Performance: ```code covtype , MatrixVector mv , ula , 6.230 , 1000 covtype , MatrixVector vm , ula , 8.895 , 1000 covtype , scalar mult , ula , 34.050 , 300 covtype , scalar plus , ula , 63.683 , 300 covtype , unaryAggregate sum , ula , 7.146 , 500 covtype , unaryAggregate rowsum , ula , 10.895 , 3000 covtype , unaryAggregate colsum , ula , 8.268 , 500 covtype , unaryAggregate colmax , ula , 7.886 , 3000 covtype , unaryAggregate max , ula , 7.116 , 500 covtype , unaryAggregate min , ula , 7.508 , 500 covtype , unaryAggregate rowmax , ula , 8.403 , 3000 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
