[ 
https://issues.apache.org/jira/browse/FLINK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15636993#comment-15636993
 ] 

ASF GitHub Bot commented on FLINK-4964:
---------------------------------------

Github user tfournier314 commented on the issue:

    https://github.com/apache/flink/pull/2740
  
    I've changed my code so that I have now mapping:DataSet[(String,Long)]
    
    val mapping = input
          .mapWith( s => (s, 1) )
          .groupBy( 0 )
          .reduce( (a, b) => (a._1, a._2 + b._2) )
          .partitionByRange( 1 )
          .zipWithIndex
          .mapWith { case (id, (label, count)) => (label, id) }
    
    Parsing a new DataSet[String] called rawInput, I'd like to use this mapping 
and associate each "label" of rawInput an ID (which is the Long value of 
mapping).
    
    Is it possible with a streaming approach (need a join for example) ? 



> FlinkML - Add StringIndexer
> ---------------------------
>
>                 Key: FLINK-4964
>                 URL: https://issues.apache.org/jira/browse/FLINK-4964
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Thomas FOURNIER
>            Priority: Minor
>
> Add StringIndexer as described here:
> http://spark.apache.org/docs/latest/ml-features.html#stringindexer
> This will be added in package preprocessing of FlinkML



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to