[
https://issues.apache.org/jira/browse/MESOS-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627106#comment-14627106
]
Kevin Sweeney commented on MESOS-2940:
--------------------------------------
It's creating and seeding a new Mersenne Twister RNG on every call to
UUID::random(), so you're not using the pRNG at all, you're using expensive
entropy.
>From https://github.com/3rdparty/stout/blob/master/include/stout/uuid.hpp#L18
{code}
return UUID(boost::uuids::random_generator()());
{code}
This calls the boost::uuids::basic_random_generator<mt19937>() default
constructor, which will always seed the provided RNG:
{code}
{
// seed the random number generator
detail::seed(*pURNG);
}
{code}
This uses http://www.boost.org/doc/libs/1_53_0/boost/uuid/seed_rng.hpp which,
according to documentation is very slow:
{noformat}
// It creates random numbers from a sha1 hash of data from a variary of sources,
// all of which are standard function calls. It produces random numbers slowly.
// Peter Dimov provided the details of sha1_random_digest_().
// see http://archives.free.net.ph/message/20070507.175609.4c4f503a.en.html
{noformat}
To improve performance you'd want to use one of the explicit constructors:
{{explicit basic_random_generator(UniformRandomNumberGenerator* pGen)}} and
provide an instance of a UniformRandomNumberGenerator. You probably want this
to be a thread-local, as generator engines are stateful.
{code}
// keep a pointer to a random number generator
// don't seed a given random number generator
explicit basic_random_generator(UniformRandomNumberGenerator* pGen)
{code}
> Reconciliation is expensive for large numbers of tasks.
> -------------------------------------------------------
>
> Key: MESOS-2940
> URL: https://issues.apache.org/jira/browse/MESOS-2940
> Project: Mesos
> Issue Type: Improvement
> Components: master
> Reporter: Benjamin Mahler
> Assignee: Benjamin Mahler
> Priority: Critical
> Labels: twitter
> Fix For: 0.23.0
>
> Attachments: perf-kernel.svg
>
>
> We've observed that both implicit and explicit reconciliation are expensive
> for large numbers of tasks:
> {noformat: title=Explicit O(100,000) tasks: 70secs}
> I0625 20:55:23.716320 21937 master.cpp:3863] Performing explicit task state
> reconciliation for N tasks of framework F (NAME) at S@IP:PORT
> I0625 20:56:34.812464 21937 master.cpp:5041] Removing task T with resources R
> of framework F on slave S at slave(1)@IP:PORT (HOST)
> {noformat}
> {noformat: title=Implicit with O(100,000) tasks: 60secs}
> I0625 20:25:22.310601 21936 master.cpp:3802] Performing implicit task state
> reconciliation for framework F (NAME) at S@IP:PORT
> I0625 20:26:23.874528 21921 master.cpp:218] Scheduling shutdown of slave S
> due to health check timeout
> {noformat}
> Let's add a benchmark to see if there are any bottlenecks here, and to guide
> improvements.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)