dataframe cumulative sum

2015-05-29 Thread Cesar Flores
What will be the more appropriate method to add a cumulative sum column to a data frame. For example, assuming that I have the next data frame: flag | price -- 1|47.808764653746 1|47.808764653746 1|31.9869279512204 How can I create a data frame with an extra

Re: dataframe cumulative sum

2015-05-29 Thread Yin Huai
Hi Cesar, We just added it in Spark 1.4. In Spark 1.4, You can use window function in HiveContext to do it. Assuming you want to calculate the cumulative sum for every flag, import org.apache.spark.sql.expressions.Window import org.apache.spark.sql.functions._ df.select( $flag, $price,