[ 
https://issues.apache.org/jira/browse/SPARK-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-3430.
------------------------------
    Resolution: Won't Fix

PR says this is WontFix

> Introduce ValueIncrementableHashMapAccumulator to compute Histogram and other 
> statistical metrics
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3430
>                 URL: https://issues.apache.org/jira/browse/SPARK-3430
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Suraj Satishkumar Sheth
>
> Pull request : https://github.com/apache/spark/pull/2314
> Currently, we don't have a Hash map which can be used as an accumulator to 
> produce Histogram or distribution. This class will provide a customized 
> HashMap implemetation whose value can be incremented.
> e.g. map+=(a,1), map+=(a,6) will lead to (a,7)
> This can have various applications like computation of Histograms, Sampling 
> Strategy generation, Statistical metric computation, in MLLib, etc.
> Example usage :
>     val map  = sc.accumulableCollection(new 
> ValueIncrementableHashMapAccumulator[Int]())
>     
>     var countMap = sc.broadcast(map)
>     
>     data.foreach(record => {
>       var valArray = record.split("\t")
>       var valString = ""
>       var i = 0
>       var tuple = (0,1L)
>       countMap.value += tuple
>       for(valString <- valArray) {
>         i = i+1
>         try{
>           valString.toDouble
>           var tuple = (i,1L)
>           countMap.value += tuple
>         }
>         catch {
>           case ioe: Exception => None
>         }
>         
>       }
>     })



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to