[
https://issues.apache.org/jira/browse/FLINK-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256433#comment-15256433
]
Aljoscha Krettek commented on FLINK-3806:
-----------------------------------------
Correct, I think the {{count()}}/{{collect()}} methods are somewhat dangerous
as long as we recompute them every time.
> Revert use of DataSet.count() in Gelly
> --------------------------------------
>
> Key: FLINK-3806
> URL: https://issues.apache.org/jira/browse/FLINK-3806
> Project: Flink
> Issue Type: Improvement
> Components: Gelly
> Affects Versions: 1.1.0
> Reporter: Greg Hogan
> Priority: Critical
>
> FLINK-1632 replaced {{GraphUtils.count}} with {{DataSetUtils.count}}. The
> former returns a {{DataSet}} while the latter executes a job to return a Java
> value.
> {{DataSetUtils.count}} is called from {{Graph.numberOfVertices}} and
> {{Graph.numberOfEdges}} which are called from {{GatherSumApplyIteration}} and
> {{ScatterGatherIteration}} as well as the {{PageRank}} algorithms when the
> user does not pass the number of vertices as a parameter.
> As noted in FLINK-1632, this does make the code simpler but if my
> understanding is correct will materialize the Graph twice. The Graph will
> need to be reread from input, regenerated, or recomputed by preceding
> algorithms.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)