I want to do Index similar to RDBMS on keyPnl on the pnl_type_code so that
group by can be done efficitently. How do I  achieve that?
Currently below code blow out of memory in Spark on 60GB of data.
keyPnl is very large file. We have been stuck for 1 week. trying kryo,
mapvalue etc but without prevail.
We want to do partition on pnl_type_code but has no idea how to do that.
Please advice.


  val keyPnl = pnl.filter(_.rf_level == "0").keyBy(f=>f.portfolio_code)
  val keyPosition = positions.filter(_.pl0_code == "3").keyBy(f =>
f.portfolio_code)

  val JoinPnlPortfolio = keyPnl.leftOuterJoin(keyPosition)

  var result = JoinPnlPortfolio.groupBy(r => (r._2._1.pnl_type_code))
                .mapValues(kv => (kv.map(mapper).fold (List[Double]())
(Vector.reduceVector _)))
                .mapValues(kv => (Var.percentile(kv, 0.99)))




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-join-composite-keys-tp8696p9423.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to