Fabio Wanner created FLINK-32412:
------------------------------------

             Summary: JobID collisions in FlinkSessionJob
                 Key: FLINK-32412
                 URL: https://issues.apache.org/jira/browse/FLINK-32412
             Project: Flink
          Issue Type: Bug
          Components: Kubernetes Operator
    Affects Versions: kubernetes-operator-1.5.0
            Reporter: Fabio Wanner


>From time to time we see {{JobId}} collisions in our deployments due to the 
>low entropy of the generated {{{}JobId{}}}. The problem is that, although the 
>{{uid}} from the k8s-resource (which is a UUID, but we don't know of which 
>version), only the {{hashCode}} of it will be used for the {{{}JobId{}}}. The 
>{{hashCode}} is an integer, thus 32 bits. If we look at the birthday problem 
>theorem we can expect a collision with a 50% chance with only 77000 random 
>integers. 

In reality we seem to see the problem more often, but this could be because the 
{{uid}} might not be completely random, therefore increasing the chances if we 
just use parts of it.

We propose to at least use the complete 64 bits of the upper part of the 
{{{}JobId{}}}, where 5.1×10{^}9{^} IDs are needed for a collision chance of 
50%. We could even argue that most probably 64 bit for the generation number is 
not needed and another 32 bit could be spent on the uid to increase the entropy 
of the {{JobId}} even further (This would mean the max generation would be 
4,294,967,295).

Our suggestion for using 64 bits would be:
{code:java}
new JobID(
    UUID.fromString(Preconditions.checkNotNull(uid)).getMostSignificantBits(), 
    Preconditions.checkNotNull(generation)
);
{code}
Any thoughts on this? I would create a PR once we know how to proceed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to