[
https://issues.apache.org/jira/browse/FLINK-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098924#comment-15098924
]
ASF GitHub Bot commented on FLINK-2716:
---------------------------------------
Github user StephanEwen commented on a diff in the pull request:
https://github.com/apache/flink/pull/1462#discussion_r49789099
--- Diff:
flink-scala/src/main/scala/org/apache/flink/api/scala/utils/package.scala ---
@@ -103,6 +105,25 @@ package object utils {
: DataSet[T] = {
wrap(jutils.sampleWithSize(self.javaSet, withReplacement,
numSamples, seed))
}
+
+ //
--------------------------------------------------------------------------------------------
+ // Checksum
+ //
--------------------------------------------------------------------------------------------
+
+ /**
+ * Convenience method to get the count (number of elements) of a
DataSet
+ * as well as the checksum (sum over element hashes).
+ *
+ * @return A ChecksumHashCode with the count and checksum of elements
in the data set.
+ *
+ * @see [[org.apache.flink.api.java.Utils.ChecksumHashCodeHelper]]
+ */
+ def checksumHashCode: ChecksumHashCode = {
--- End diff --
Would be good to give this method parenthesis. It triggers distributed
execution, so is not quite side-effect free.
> Checksum method for DataSet and Graph
> -------------------------------------
>
> Key: FLINK-2716
> URL: https://issues.apache.org/jira/browse/FLINK-2716
> Project: Flink
> Issue Type: Improvement
> Components: DataSet API, Gelly
> Affects Versions: 0.10.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
> Priority: Minor
>
> {{DataSet.count()}}, {{Graph.numberOfVertices()}}, and
> {{Graph.numberOfEdges()}} provide measures of the number of distributed data
> elements. New {{DataSet.checksum()}} and {{Graph.checksum()}} methods will
> summarize the content of data elements and support algorithm validation,
> integration testing, and benchmarking.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)