[ 
https://issues.apache.org/jira/browse/FLINK-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751122#comment-15751122
 ] 

ASF GitHub Bot commented on FLINK-2186:
---------------------------------------

GitHub user tonycox opened a pull request:

    https://github.com/apache/flink/pull/3012

    [FLINK-2186] Add readCsvAsRow methods to CsvReader and scala ExecutionEnv

    Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful 
description of your changes.
    
    - [ ] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the 
JIRA id)
    
    - [ ] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [ ] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed
    
    Rework CSV import to support very wide files

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tonycox/flink FLINK-2186

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3012.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3012
    
----
commit 905e1fe5f530bcec92af3d4e3ebc8f2c0e26cdf9
Author: tonycox <[email protected]>
Date:   2016-12-12T11:51:56Z

    [FLINK-2186] Add readCsvAsRow methods to CsvReader and scala ExecutionEnv.

----


> Rework CSV import to support very wide files
> --------------------------------------------
>
>                 Key: FLINK-2186
>                 URL: https://issues.apache.org/jira/browse/FLINK-2186
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library, Scala API
>            Reporter: Theodore Vasiloudis
>            Assignee: Anton Solovev
>
> In the current readVcsFile implementation, importing CSV files with many 
> columns can become from cumbersome to impossible.
> For example to import an 11 column file we need to write:
> {code}
> val cancer = env.readCsvFile[(String, String, String, String, String, String, 
> String, String, String, String, 
> String)]("/path/to/breast-cancer-wisconsin.data")
> {code}
> For many use cases in Machine Learning we might have CSV files with thousands 
> or millions of columns that we want to import as vectors.
> In that case using the current readCsvFile method becomes impossible.
> We therefore need to rework the current function, or create a new one that 
> will allow us to import CSV files with an arbitrary number of columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to