Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/635#issuecomment-35841766 Hey @coderxiang - this is interesting functionality but I'm -1 on including it in the standard API. The main reason is that this will perform poorly on most large datasets and make it easy for people to shoot themselves in the foot. A second reason is that the use case isn't totally clear - as per some of @markhamstra's comments.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. To do so, please top-post your response. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---