[
https://issues.apache.org/jira/browse/FLINK-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062214#comment-15062214
]
ASF GitHub Bot commented on FLINK-2716:
---------------------------------------
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1462#issuecomment-165489647
I think this is a nice addition. Unsure, though, if `checksum()` should be
added to the `DataSet` directly. It is a tradeoff between convenience and API
overload.
A simpler alternative would be `Checksum chk = Utils.checksum(dataSet);`.
Also, this is one way of computing a simple checksum (hash code). There may
be other arithmetic methods, so it make sense to include some form of name of
the checkpoint method in the method name. How about something like
`Utils.checksumHashCode(dataSet)` or so?
> Checksum method for DataSet and Graph
> -------------------------------------
>
> Key: FLINK-2716
> URL: https://issues.apache.org/jira/browse/FLINK-2716
> Project: Flink
> Issue Type: Improvement
> Components: Gelly, Java API, Scala API
> Affects Versions: 0.10.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
> Priority: Minor
>
> {{DataSet.count()}}, {{Graph.numberOfVertices()}}, and
> {{Graph.numberOfEdges()}} provide measures of the number of distributed data
> elements. New {{DataSet.checksum()}} and {{Graph.checksum()}} methods will
> summarize the content of data elements and support algorithm validation,
> integration testing, and benchmarking.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)