GitHub user mengxr opened a pull request:

    https://github.com/apache/incubator-spark/pull/578

    Adding assignRanks and assignUniqueIds to RDD

    Assign ranks to an ordered or unordered data set is a common operation. 
This could be done by first counting records in each partition and then assign 
ranks in parallel.
    
    The purpose of assigning ranks to an unordered set is usually to get a 
unique id for each item, e.g., to map feature names to feature indices. In such 
cases, the assignment could be done without counting records, saving one spark 
job.
    
    https://spark-project.atlassian.net/browse/SPARK-1076

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-spark rank

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/578.patch

----
commit 21b434b77f1a7ffd75ba2d1ad4ab2296f1914971
Author: Xiangrui Meng <m...@databricks.com>
Date:   2014-02-10T23:18:41Z

    add assignRanks and assignUniqueIds to RDD

commit 630868c88f14ea955991acfd3d68caa8be6dedec
Author: Xiangrui Meng <m...@databricks.com>
Date:   2014-02-10T23:20:21Z

    newline

----

Reply via email to