Hi,

I just checked and i can see that there is  method called withColumn:
def  withColumn(colName: String, col: Column
<http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Column.html>
): DataFrame
<http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrame.html>

Returns a new DataFrame
<http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrame.html>
by adding a column.

I can't test it now... But i think it should work.

As i see it whole idea for data frames is to make them like data frames in
R. And in R you can do that easily.

It was late last night and i was tired but my idea was that you can iterate
over first set add some index to every log using acumulators and then
iterate over other set and add index from other acumulator then create
tuple with keys from indexes and join. It is ugly and not efficient, and
you should avoid it. :]

Best

Bojan

On Thu, Apr 9, 2015 at 1:35 AM, barmaley [via Apache Spark User List] <
ml-node+s1001560n22430...@n3.nabble.com> wrote:

> Hi Bojan,
>
> Could you please expand your idea on how to append to RDD? I can think of
> how to append a constant value to each row on RDD:
>
> //oldRDD - RDD[Array[String]]
> val c = "const"
> val newRDD = oldRDD.map(r=>c+:r)
>
> But how to append a custom column to RDD? Something like:
>
> val colToAppend = sc.makeRDD(1 to oldRDD.count().toInt)
> //or sc.parallelize(1 to oldRDD.count().toInt)
> //or (1 to 1 to oldRDD.count().toInt).toArray
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385p22430.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=Ymxvb2Q5cmF2ZW5AZ21haWwuY29tfDF8NTk3ODE0NzQ2>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Append-column-to-Data-Frame-or-RDD-tp22385p22432.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to