Re: Add row IDs column to data frame

2017-01-12 Thread akbar501
The following are 2 different approaches to adding an id/index to RDDs and 1 approach to adding an index to a DataFrame. Add an index column to an RDD ```scala // RDD val dataRDD = sc.textFile("./README.md") // Add index then set index as key in map() transformation // Results in RDD[(Long,

Re: Add row IDs column to data frame

2017-01-11 Thread akbar501
RDDs, DataFrames and Datasets are all immutable. So, you cannot edit any of these. However, the approach you should take is to call transformation functions on the RDD/DataFrame/Dataset. RDD transformation functions will return a new RDD, DataFrame transformations will return a new DataFrame and