[
https://issues.apache.org/jira/browse/FLINK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262588#comment-15262588
]
ASF GitHub Bot commented on FLINK-3277:
---------------------------------------
Github user greghogan commented on the pull request:
https://github.com/apache/flink/pull/1671#issuecomment-215503980
The two implementations have small differences but the algorithm is the
same. I'll be removing the two steps which are concerned with degree skew since
I had not previously looked at the degree distribution but I haven't found a
graph that exhibits degree skew under the algorithm's optimization to generate
triplets from the vertex with smallest degree. Would be nice to have a proof,
though.
I expect most of the performance difference to be in `DegreeCounter` and
`TriadBuilder` caching objects but not supporting object reuse. Using immutable
boxed primitives has the same effect as disabling object reuse since
deserialization must create a fresh object each time.
> Use Value types in Gelly API
> ----------------------------
>
> Key: FLINK-3277
> URL: https://issues.apache.org/jira/browse/FLINK-3277
> Project: Flink
> Issue Type: Improvement
> Components: Gelly
> Affects Versions: 1.0.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
>
> This would be a breaking change so the discussion needs to happen before the
> 1.0.0 release.
> I think it would benefit Flink to use {{Value}} types wherever possible. The
> {{Graph}} functions {{inDegrees}}, {{outDegrees}}, and {{getDegrees}} each
> return {{DataSet<Tuple2<K, Long>>}}. Using {{Long}} creates a new heap object
> for every serialization and deserialization. The mutable {{Value}} types do
> not suffer from this issue when object reuse is enabled.
> I lean towards a preference for conciseness in documentation and performance
> in examples and APIs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)