[jira] [Commented] (FLINK-4964) FlinkML - Add StringIndexer

ASF GitHub Bot (JIRA) Mon, 14 Nov 2016 14:38:17 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665260#comment-15665260
 ]


ASF GitHub Bot commented on FLINK-4964:
---------------------------------------

Github user tfournier314 commented on the issue:

    https://github.com/apache/flink/pull/2740
  
    @greghogan I've not pushed the code yet because my tests are still 
incorrect.
    Indeed the following code:
    
    val env = ExecutionEnvironment.getExecutionEnvironment
    val fitData = 
env.fromCollection(List("a","b","c","a","a","d","a","a","a","b","b","c","a","c","b","c","b"))
    fitData.map(s => (s,1)).groupBy(0)
          .reduce((a,b) => (a._1, a._2 + b._2))
          .partitionByRange(1)
          .sortPartition(1, Order.DESCENDING)
          .zipWithIndex
          .print()
    
    returns 
    
    (0,(b,5))
    (1,(c,4))
    (2,(d,1))
    (3,(a,7))
    
    And I would like the following:
    
    (1,(b,5))
    (2,(c,4))
    (3,(d,1))
    (0,(a,7))
    
    Even if the order inside partitions is preserved (with mapPartitions), the 
order between partitions is not right ?
    



> FlinkML - Add StringIndexer
> ---------------------------
>
>                 Key: FLINK-4964
>                 URL: https://issues.apache.org/jira/browse/FLINK-4964
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Thomas FOURNIER
>            Priority: Minor
>
> Add StringIndexer as described here:
> http://spark.apache.org/docs/latest/ml-features.html#stringindexer
> This will be added in package preprocessing of FlinkML



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4964) FlinkML - Add StringIndexer

Reply via email to