tl;dr:

If you use resource values with more than three decimal digits of
precision (e.g., you are launching a task that uses 2.5001 CPUs),
please speak up!

====

Mesos uses floating point to represent scalar resource values, such as
the number of CPUs in a resource offer or dynamic reservation. The
master does resource math in floating point, which leads to a few
problems:

* due to roundoff error, frameworks can receive offers that have
unexpected resource values (e.g., MESOS-3990)
* various internal assertions in the master can fail due to roundoff
error (e.g., MESOS-3552).

In the long term, we can solve these problems by switching to a
fixed-point representation for scalar values. However, that will
require a long deprecation cycle.

In the short term, we should make floating point behavior more
reliable. To do that, I propose:

(1) Resource values will support AT MOST three decimal digits of
precision. Additional precision in resource values will be discarded
(via rounding).

(2) The master will internally used a fixed-point representation to
avoid unpredictable roundoff behavior.

For more details, please see the design doc here:
https://docs.google.com/document/d/14qLxjZsfIpfynbx0USLJR0GELSq8hdZJUWw6kaY_DXc
-- comments welcome!

Thanks,
Neil

Reply via email to