Re: Add row IDs column to data frame

2017-01-12 Thread ayan guha
Just in case you are more comfortable with SQL, row_number over () should also generate an unique id. On Thu, Jan 12, 2017 at 7:00 PM, akbar501 wrote: > The following are 2 different approaches to adding an id/index to RDDs and > 1 > approach to adding an index to a

Re: Add row IDs column to data frame

2017-01-12 Thread akbar501
The following are 2 different approaches to adding an id/index to RDDs and 1 approach to adding an index to a DataFrame. Add an index column to an RDD ```scala // RDD val dataRDD = sc.textFile("./README.md") // Add index then set index as key in map() transformation // Results in RDD[(Long,

Re: Add row IDs column to data frame

2017-01-11 Thread akbar501
RDDs, DataFrames and Datasets are all immutable. So, you cannot edit any of these. However, the approach you should take is to call transformation functions on the RDD/DataFrame/Dataset. RDD transformation functions will return a new RDD, DataFrame transformations will return a new DataFrame and

Fwd: Add row IDs column to data frame

2015-10-02 Thread Josh Levy-Kramer
Hi, Iv created a simple example using the withColumn method but throws an error. Try: val df = List( (1,1), (1,1), (1,2), (2,2) ).toDF("col1", "col2") val index_col = sqlContext.range( df.count() ).col("id") val df_with_index = df.withColumn("index", index_col) The error I get is:

Re: Add row IDs column to data frame

2015-04-09 Thread Bojan Kostic
-- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385p22430.html To start a new topic under Apache Spark User List, email ml-node+s1001560n1...@n3.nabble.com

Re: Add row IDs column to data frame

2015-04-08 Thread olegshirokikh
-IDs-column-to-data-frame-tp22385p22427.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Add row IDs column to data frame

2015-04-08 Thread Bojan Kostic
? -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385p22427.html To start a new topic under Apache Spark User List, email ml-node+s1001560n1...@n3

Add row IDs column to data frame

2015-04-05 Thread olegshirokikh
to dataDF.count().toInt).toDF(ID) dataDF = dataDF.withColumn(ID, rowDF(ID)) Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Add row IDs column to data frame

2015-04-05 Thread Xiangrui Meng
= dataDF.withColumn(ID, rowDF(ID)) Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Add row IDs column to data frame

2015-04-05 Thread Xiangrui Meng
.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e