[
https://issues.apache.org/jira/browse/FLINK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15636993#comment-15636993
]
ASF GitHub Bot commented on FLINK-4964:
---------------------------------------
Github user tfournier314 commented on the issue:
https://github.com/apache/flink/pull/2740
I've changed my code so that I have now mapping:DataSet[(String,Long)]
val mapping = input
.mapWith( s => (s, 1) )
.groupBy( 0 )
.reduce( (a, b) => (a._1, a._2 + b._2) )
.partitionByRange( 1 )
.zipWithIndex
.mapWith { case (id, (label, count)) => (label, id) }
Parsing a new DataSet[String] called rawInput, I'd like to use this mapping
and associate each "label" of rawInput an ID (which is the Long value of
mapping).
Is it possible with a streaming approach (need a join for example) ?
> FlinkML - Add StringIndexer
> ---------------------------
>
> Key: FLINK-4964
> URL: https://issues.apache.org/jira/browse/FLINK-4964
> Project: Flink
> Issue Type: New Feature
> Reporter: Thomas FOURNIER
> Priority: Minor
>
> Add StringIndexer as described here:
> http://spark.apache.org/docs/latest/ml-features.html#stringindexer
> This will be added in package preprocessing of FlinkML
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)