I have a RDD of type (String,
Iterable[(com.ebay.ep.poc.spark.reporting.process.detail.model.DetailInputRecord,
com.ebay.ep.poc.spark.reporting.process.model.DataRecord)])]
Here String is Key and a list of tuples for that key. I got above RDD after
doing a groupByKey. I later want to compute
If the number of items is very large, have you considered using
probabilistic counting? The HyperLogLogPlus
https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java
class from stream-lib https://github.com/addthis/stream-lib
I modified to
detailInputsToGroup.map {
case (detailInput, dataRecord) =
val key: StringBuilder = new StringBuilder
dimensions.foreach {
dimension =
key ++= {