[ 
https://issues.apache.org/jira/browse/MESOS-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612634#comment-16612634
 ] 

Benjamin Bannier commented on MESOS-9227:
-----------------------------------------

I believe to some degree the way our fixed point math truncates away small 
fractions has prevented exactness issues for smaller values.

Since scalar resource values are stored as {{double}} internally in the 
{{Resource}} message, they can only hold around 15 significant digits. We want 
to guarantee correct fixed point math with up to three decimal places, so we 
can represent values exactly up to around 10ยนยน kB = 0.1 PB.

Such an amount of {{disk}} is unfortunately not unrealistic even for a single 
agent where we might already run into correctness issues, but it should be 
possible to e.g., warn users that agent resources might not be representable. 
The issue is worse if the total capacity of {{disk}} in the cluster reaches 
petabyte scale (either with some agents with huge, but representable disks, or 
many agents with considerable disks). The sum of {{disk}} might be not 
representable in the master, but would be below the obviously problematic 
threshold for each agent, making it harder to diagnose such issues.

A possible short term mitigation might be to store disk resources in GB instead 
of kB which would by us a couple magnitudes at the cost of being unable to 
represent values less than around 1 MB.

> `Value::Scalar` cannot handle large floating point calculation due to fixed 
> point conversion.
> ---------------------------------------------------------------------------------------------
>
>                 Key: MESOS-9227
>                 URL: https://issues.apache.org/jira/browse/MESOS-9227
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Meng Zhu
>            Priority: Blocker
>
> While `scalar` holds a `double`, internally we convert floating point to 
> fixed point to ensure only three decimal digits:
> https://github.com/apache/mesos/blob/851ec9c5dca672ed4efc77545c86121463695e4f/src/common/values.cpp#L48-L53
> And all internal arithmetic calculations are done using `long long`, e.g.:
> https://github.com/apache/mesos/blob/851ec9c5dca672ed4efc77545c86121463695e4f/src/common/values.cpp#L123-L128
> This has the unexpected consequence of the inability to handle large values. 
> One impacted use case we are seeing is with exabytes of disks. This will 
> overflow the fixed point representation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to