[ https://issues.apache.org/jira/browse/SPARK-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-7150: ------------------------------- Summary: SQLContext.range() (was: Facilitate random column generation for DataFrames) > SQLContext.range() > ------------------ > > Key: SPARK-7150 > URL: https://issues.apache.org/jira/browse/SPARK-7150 > Project: Spark > Issue Type: Sub-task > Components: ML, SQL > Reporter: Joseph K. Bradley > Priority: Minor > Labels: starter > > It would be handy to have easy ways to construct random columns for > DataFrames. Proposed API: > {code} > class SQLContext { > // Return a DataFrame with a single column named "id" that has consecutive > value from 0 to n. > def range(n: Long): DataFrame > def range(n: Long, numPartitions: Int): DataFrame > } > {code} > Usage: > {code} > // uniform distribution > ctx.range(1000).select(rand()) > // normal distribution > ctx.range(1000).select(randn()) > {code} > We should add an RangeIterator that supports long start/stop position, and > then use it to create an RDD as the basis for this DataFrame. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org