[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files

ASF GitHub Bot (JIRA) Mon, 13 Jul 2015 01:58:04 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624390#comment-14624390
 ]


ASF GitHub Bot commented on FLINK-1520:
---------------------------------------

Github user andralungu commented on the pull request:

    https://github.com/apache/flink/pull/847#issuecomment-120856171
  
    Hi,
    
    I just had a closer look at this PR and it made me seriously question the 
utility of a `Graph.fromCSV` method. Why? First of all because it's more 
limited than the regular `env.fromCsv()` in the sense that it does not allow 
POJOs and it would be a bit tedious to support that. There would be a need for 
methods with 2 to n fields, according to the amount of attributes present in 
the POJO. 
    
    Second, because, and I am speaking strictly as a user here, I would rather 
write:
    private static DataSet<Edge<Long, Double>> 
getEdgesDataSet(ExecutionEnvironment env) {
    
                if(fileOutput) {
                        return env.readCsvFile(edgeInputPath)
                                        .ignoreComments("#")
                                        .fieldDelimiter("\t")
                                        .lineDelimiter("\n")
                                        .types(Long.class, Long.class, 
Double.class)
                                        .map(new Tuple3ToEdgeMap<Long, 
Double>());
                } else {
                        return 
CommunityDetectionData.getDefaultEdgeDataSet(env);
                }
        }
    
    than...
    
    private static Graph<Long, Long, Double> getGraph(ExecutionEnvironment env) 
{
                Graph<Long, Long, Double> graph;
                if(!fileOutput) {
                        DataSet<Edge<Long, Double>> edges = 
CommunityDetectionData.getDefaultEdgeDataSet(env);
                        graph = Graph.fromDataSet(edges,
                                        new MapFunction<Long, Long>() {
    
                                                public Long map(Long label) {
                                                        return label;
                                                }
                                        }, env);
                } else {
                        graph = Graph.fromCsvReader(edgeInputPath,new 
MapFunction<Long, Long>() {
                                public Long map(Long label) {
                                        return label;
                                }
                        }, env).ignoreCommentsEdges("#")
                                        .fieldDelimiterEdges("\t")
                                        .lineDelimiterEdges("\n")
                                        .typesEdges(Long.class, Double.class)
                                        .typesVertices(Long.class, Long.class);
                }
                return graph;
        }
    
    Maybe it's just a preference thing... but I believe it's at least worth a 
discussion. On the other hand, the utility of such a method should have been 
questioned from its early Jira days, so I guess that's my mistake.
    
    I would like to hear your thoughts on this. 
    Thanks!


> Read edges and vertices from CSV files
> --------------------------------------
>
>                 Key: FLINK-1520
>                 URL: https://issues.apache.org/jira/browse/FLINK-1520
>             Project: Flink
>          Issue Type: New Feature
>          Components: Gelly
>            Reporter: Vasia Kalavri
>            Assignee: Shivani Ghatge
>            Priority: Minor
>              Labels: easyfix, newbie
>
> Add methods to create Vertex and Edge Datasets directly from CSV file inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files

Reply via email to