[
https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624390#comment-14624390
]
ASF GitHub Bot commented on FLINK-1520:
---------------------------------------
Github user andralungu commented on the pull request:
https://github.com/apache/flink/pull/847#issuecomment-120856171
Hi,
I just had a closer look at this PR and it made me seriously question the
utility of a `Graph.fromCSV` method. Why? First of all because it's more
limited than the regular `env.fromCsv()` in the sense that it does not allow
POJOs and it would be a bit tedious to support that. There would be a need for
methods with 2 to n fields, according to the amount of attributes present in
the POJO.
Second, because, and I am speaking strictly as a user here, I would rather
write:
private static DataSet<Edge<Long, Double>>
getEdgesDataSet(ExecutionEnvironment env) {
if(fileOutput) {
return env.readCsvFile(edgeInputPath)
.ignoreComments("#")
.fieldDelimiter("\t")
.lineDelimiter("\n")
.types(Long.class, Long.class,
Double.class)
.map(new Tuple3ToEdgeMap<Long,
Double>());
} else {
return
CommunityDetectionData.getDefaultEdgeDataSet(env);
}
}
than...
private static Graph<Long, Long, Double> getGraph(ExecutionEnvironment env)
{
Graph<Long, Long, Double> graph;
if(!fileOutput) {
DataSet<Edge<Long, Double>> edges =
CommunityDetectionData.getDefaultEdgeDataSet(env);
graph = Graph.fromDataSet(edges,
new MapFunction<Long, Long>() {
public Long map(Long label) {
return label;
}
}, env);
} else {
graph = Graph.fromCsvReader(edgeInputPath,new
MapFunction<Long, Long>() {
public Long map(Long label) {
return label;
}
}, env).ignoreCommentsEdges("#")
.fieldDelimiterEdges("\t")
.lineDelimiterEdges("\n")
.typesEdges(Long.class, Double.class)
.typesVertices(Long.class, Long.class);
}
return graph;
}
Maybe it's just a preference thing... but I believe it's at least worth a
discussion. On the other hand, the utility of such a method should have been
questioned from its early Jira days, so I guess that's my mistake.
I would like to hear your thoughts on this.
Thanks!
> Read edges and vertices from CSV files
> --------------------------------------
>
> Key: FLINK-1520
> URL: https://issues.apache.org/jira/browse/FLINK-1520
> Project: Flink
> Issue Type: New Feature
> Components: Gelly
> Reporter: Vasia Kalavri
> Assignee: Shivani Ghatge
> Priority: Minor
> Labels: easyfix, newbie
>
> Add methods to create Vertex and Edge Datasets directly from CSV file inputs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)