I have a RDD of type (String,

Here String is Key and a list of tuples for that key. I got above RDD after
doing a groupByKey. I later want to compute total number of values for a
given key and total number of unique values for the same given key and
hence i do this

    val totalViCount = details.size.toLong
    val uniqueViCount =

How do i do this using reduceByKey.

*Total Code:*

      val groupedDetail: RDD[(String, Iterable[(DetailInputRecord,
DataRecord)])] = detailInputsToGroup.map {
        case (detailInput, dataRecord) =>
          val key: StringBuilder = new StringBuilder
          dimensions.foreach {
            dimension =>
              key ++= {

          (key.toString, (detailInput, dataRecord))

      groupedDetail.map {
        case (key, values) => {
          val valueList = values.toList

          //Compute dimensions // You can skup this
          val (detailInput, dataRecord) = valueList.head
          val schema = SchemaUtil.outputSchema(_detail)
          val detailOutput = new DetailOutputRecord(detail, new
          DataUtil.populateDimensions(schema, dimensions.toArray,
detailInput, dataRecord, detailOutput)

          val metricsData = metricProviders.flatMap {
            case (className, instance) =>
              val data = instance.getMetrics(valueList)
          metricsData.map { case (k, v) => detailOutput.put(k, v) }
          val wrap = new AvroKey[DetailOutputRecord](detailOutput)
          (wrap, NullWritable.get)

  def getMetrics(details: List[(DetailInputRecord, DataRecord)]) = {
    val totalViCount = details.size.toLong
    val uniqueViCount =
    new ViewItemCountMetric(totalViCount, uniqueViCount)

I understand that totalViCount can be implemented using reduceByKey. How
can i implement total unique count as i need to have the full list to know
the unique values.


Reply via email to