[
https://issues.apache.org/jira/browse/MESOS-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Bannier updated MESOS-7800:
------------------------------------
Attachment: stat_all_task_labels.dat
stat_individual_labels.dat
I went through a couple of sample workloads, and am attaching files with pairs
of the length of the key and value, respectively. One file contains the an
entry for each {{Label}} used ({{stat_individual_labels.dat}}); the other file
contains entries accumulating the sizes of all keys or values by task.
Looking at the values it seems there are two groups of workloads here, one
where all {{Label}} contents should fit well into 0.5 kB uncompressed, while
the other group seems to need around 16kB. While the first group clearly only
passes _lightweight data_ as documented, the second group passes encoded data
payloads.
> Tasks with many labels can cause disproportionally huge allocations
> -------------------------------------------------------------------
>
> Key: MESOS-7800
> URL: https://issues.apache.org/jira/browse/MESOS-7800
> Project: Mesos
> Issue Type: Bug
> Components: agent, master
> Reporter: Benjamin Bannier
> Labels: mesosphere
> Attachments: stat_all_task_labels.dat, stat_individual_labels.dat
>
>
> {{mesos.proto}} provides the {{Labels}} message so others can add free-form
> data to a number of messages. In e.g., {{TaskInfo}} and {{ExecutorInfo}} we
> explicitly document
> {quote}
> Therefore, labels should be used to tag tasks with light-weight meta-data.
> {quote}
> We however never enforce this requirement.
> This becomes e.g., problematic in the agent where a {{TaskInfo}} will likely
> be copied often, e.g., due to multiple levels of dispatches. I have measured
> that a single {{Label}} can trigger 50-100 concurrent copies in flight on the
> agent's container launch path; our general assumption here seems to be that
> while a {{TaskInfo}} is not necessarily small, it still is not huge.
> If users embed a lot of data into e.g., {{TaskInfo}} {{labels}} this can lead
> to a temporary explosion of the agent process' memory footprint which can
> lead to it being killed by the OS.
> Due to the potential negative effects of huge {{labels}} we should evaluate
> how we can limit the amount of data we accept from users. This could mean
> limiting the size of {{TaskInfo}} or {{Labels}} we accept, measured e.g., by
> the message's {{ByteSizeLong}}. It seems that a value somehow related to
> {{ARG_MAX}} would be intuitive, but am not sure if we can go as low as the
> POSIX-mandated minimum requirement of 4096.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)