[
https://issues.apache.org/jira/browse/FLINK-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578574#comment-14578574
]
ASF GitHub Bot commented on FLINK-2150:
---------------------------------------
Github user fhueske commented on the pull request:
https://github.com/apache/flink/pull/801#issuecomment-110276236
Hi Andra,
you approach basically follows @rmetzger 's suggestion which is necessary
if you need sequential IDs. However, it comes at the cost of doing two passes
over the data and temping the data after the first map because you need to wait
for the count before you can assign IDs. Temping data means writing to and
reading from disk if you process a lot of data.
My approach won't give sequential IDs but works in a pipelined fashion with
a single Mapper and without temping. For each parallel task, you create an ID
based on its index and a counter that starts at 0. These are the two components
from which a record ID is created by shifting the counter by the number of bits
you need for the task ID which is log2 of the number of tasks.
> Add a library method that assigns unique Long values to vertices
> ----------------------------------------------------------------
>
> Key: FLINK-2150
> URL: https://issues.apache.org/jira/browse/FLINK-2150
> Project: Flink
> Issue Type: New Feature
> Components: Gelly
> Reporter: Vasia Kalavri
> Assignee: Andra Lungu
> Priority: Minor
> Labels: starter
>
> In some graph algorithms, it is required to initialize the vertex values with
> unique values (e.g. label propagation).
> This issue proposes adding a Gelly library method that receives an input
> graph and initializes its vertex values with unique Long values.
> This method can then also be used to improve the MusicProfiles example.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)