GitHub user greghogan opened a pull request:
https://github.com/apache/flink/pull/1462
[FLINK-2716] [gelly java scala] New checksum method on DataSet and Graph
This implementation aggregates using `Object.hashCode`. As noted in
FLINK-2716, `TypeComparator` has a hash function, which simply calls `hashCode`
for basic types. For composite types (pojo, tuples, and case classes) the hash
is computed over the keyed subset of fields, as noted by @StephanEwen. The
differences between `hashCode` and `hash` are immaterial for this use case.
Should this be added to the Python API? I am not finding count() on
Python's `DataSet`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/greghogan/flink
2716_checksum_method_for_dataset_and_graph
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1462.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1462
----
commit 8ac28776a81bd70918b872d159f0a21c889d081d
Author: Greg Hogan <[email protected]>
Date: 2015-12-15T19:25:54Z
[FLINK-2716] [gelly java scala] New checksum method on DataSet and Graph
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---