Github user fommil commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35218098 @martinjaggi I'm happy to advise on what the best sparse format would be for any particular problem that you're wanting to solve in spark. just let me know the matrix operations that you're performing (noting the sorts of structures you expect for each symbol) and at what points the formats have to be sent over the wire. I wouldn't get too caught up on sparse benchmarks. All they will show is which storage format works well for that problem. I could give you some incredibly efficient sparse formats that will epically fail that test, because they are designed for another problem. Column vs Row compression is a classic example: column compressed are great for multiplication from the right (or transpose mult) whereas row compression are great for multiplication from the left... but even that depends on the format of the matrix or vector on the right. And this might not be the most efficient format from a memory PoV... what if the matrices have a low band size?
If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. To do so, please top-post your response. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA.