What will be the more appropriate method to add a cumulative sum column to
a data frame. For example, assuming that I have the next data frame:
flag | price
--
1|47.808764653746
1|47.808764653746
1|31.9869279512204
How can I create a data frame with an extra
Hi Cesar,
We just added it in Spark 1.4.
In Spark 1.4, You can use window function in HiveContext to do it. Assuming
you want to calculate the cumulative sum for every flag,
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
df.select(
$flag,
$price,