Just in case you are more comfortable with SQL,
row_number over ()
should also generate an unique id.
On Thu, Jan 12, 2017 at 7:00 PM, akbar501 wrote:
> The following are 2 different approaches to adding an id/index to RDDs and
> 1
> approach to adding an index to a
The following are 2 different approaches to adding an id/index to RDDs and 1
approach to adding an index to a DataFrame.
Add an index column to an RDD
```scala
// RDD
val dataRDD = sc.textFile("./README.md")
// Add index then set index as key in map() transformation
// Results in RDD[(Long,
RDDs, DataFrames and Datasets are all immutable. So, you cannot edit any of
these. However, the approach you should take is to call transformation
functions on the RDD/DataFrame/Dataset. RDD transformation functions will
return a new RDD, DataFrame transformations will return a new DataFrame and
Hi,
Iv created a simple example using the withColumn method but throws an
error. Try:
val df = List(
(1,1),
(1,1),
(1,2),
(2,2)
).toDF("col1", "col2")
val index_col = sqlContext.range( df.count() ).col("id")
val df_with_index = df.withColumn("index", index_col)
The error I get is:
--
If you reply to this email, your message will be added to the discussion
below:
http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385p22430.html
To start a new topic under Apache Spark User List, email
ml-node+s1001560n1...@n3.nabble.com
-IDs-column-to-data-frame-tp22385p22427.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
?
--
If you reply to this email, your message will be added to the discussion
below:
http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385p22427.html
To start a new topic under Apache Spark User List, email
ml-node+s1001560n1...@n3
to dataDF.count().toInt).toDF(ID)
dataDF = dataDF.withColumn(ID, rowDF(ID))
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
= dataDF.withColumn(ID, rowDF(ID))
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e
10 matches
Mail list logo