[
https://issues.apache.org/jira/browse/FLINK-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251023#comment-15251023
]
Greg Hogan commented on FLINK-3789:
-----------------------------------
I was thinking on Clustering Coefficient, for which we return the local
clustering coefficient for each vertex as in DataSet via a GraphAlgorithm, that
it would also be nice to compute the global clustering coefficient which would
need to access accumulators. Both local and global clustering coefficient count
triangles so their is certainly advantage it computing the two simultaneously,
but there is extra cost for each so we should allow separate computation.
So there is need to do similar things as collect and count but still allow the
user to perform the execute (which of course allows direct configuration of the
job name) so they can compose multiple algorithms and analytics. Perhaps
instead of overloading these functions we can provide alternative, slightly
more sophisticated options which would allow configuring a job name. In many
ways the current implementation of count, collect, print, and checksum is very
limiting because you can only perform that single action per job. You can't
print and count, or print and write. The current DataSet API works well because
it's simple, but I think we could expand on this.
> Overload methods which trigger program execution to allow naming job
> --------------------------------------------------------------------
>
> Key: FLINK-3789
> URL: https://issues.apache.org/jira/browse/FLINK-3789
> Project: Flink
> Issue Type: Improvement
> Components: Java API
> Affects Versions: 1.1.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
> Priority: Minor
>
> Overload the following functions to additionally accept a job name to pass to
> {{ExecutionEnvironment.execute(String)}}.
> * {{DataSet.collect()}}
> * {{DataSet.count()}}
> * {{DataSetUtils.checksumHashCode(DataSet)}}
> * {{GraphUtils.checksumHashCode(Graph)}}
> Once the deprecated {{DataSet.print(String)}} and
> {{DataSet.printToErr(String)}} are removed we can overload
> {{DataSet.print()}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)