[
https://issues.apache.org/jira/browse/FLINK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665260#comment-15665260
]
ASF GitHub Bot commented on FLINK-4964:
---------------------------------------
Github user tfournier314 commented on the issue:
https://github.com/apache/flink/pull/2740
@greghogan I've not pushed the code yet because my tests are still
incorrect.
Indeed the following code:
val env = ExecutionEnvironment.getExecutionEnvironment
val fitData =
env.fromCollection(List("a","b","c","a","a","d","a","a","a","b","b","c","a","c","b","c","b"))
fitData.map(s => (s,1)).groupBy(0)
.reduce((a,b) => (a._1, a._2 + b._2))
.partitionByRange(1)
.sortPartition(1, Order.DESCENDING)
.zipWithIndex
.print()
returns
(0,(b,5))
(1,(c,4))
(2,(d,1))
(3,(a,7))
And I would like the following:
(1,(b,5))
(2,(c,4))
(3,(d,1))
(0,(a,7))
Even if the order inside partitions is preserved (with mapPartitions), the
order between partitions is not right ?
> FlinkML - Add StringIndexer
> ---------------------------
>
> Key: FLINK-4964
> URL: https://issues.apache.org/jira/browse/FLINK-4964
> Project: Flink
> Issue Type: New Feature
> Reporter: Thomas FOURNIER
> Priority: Minor
>
> Add StringIndexer as described here:
> http://spark.apache.org/docs/latest/ml-features.html#stringindexer
> This will be added in package preprocessing of FlinkML
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)