Github user martinjaggi commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35219684 @dlwh Thanks! This is of course a nice idea. Perhaps surprisingly (and good for us) such tricks seem not even necessary in the current state of the art algorithms. It's usually faster to do the smaller but earlier updates after each dot-product, i.e. each worker/thread doing one dot product and then immediately updating its weight vector (typical in SGD for example). Taking a step back, I think the PR by @mengxr here is very nice and providing the right kind of interface for all stuff relying on vectors. (Just saying that we have to keep an eye on serialization speed, but that seems well possible with the current code structure, right?)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. To do so, please top-post your response. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---