Baunsgaard commented on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-748006108
> * For dense transpose operations, we have two significant parts: allocating the dense output, and the multi-threaded transpose operation. On a box with 112 vcores, the allocation is 10x more expensive than the actual transpose operation. The conclusion would be an in-place transpose wherever possible. For example, compression is injected directly after the persistent read which makes it safe to use in-place by default for both local and distributed compression. This approach would not just improve compression times but also eliminate the unnecessary temporary memory requirements. I leave this up to you though. I will look at this! :+1: ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
