Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

Lukas Zapletal Tue, 19 Sep 2017 01:14:15 -0700

I would rather fix importing code to be faster than doing async, that
is the last resort.


Konstantin, thanks for analysis. Our import code is slow indeed, we
improved it a bit in 1.14. Note we mostly test this on PostgreSQL. For
each import, there is a log in INFO level about how much time was
spent in each phase of import (delete, add, update). Can you share the
numbers there?

What happens *I think* is that by default every node tries to update
facts every 5 minutes. At that scale, you need to increase this to
more resonable value.

When Foreman is busy, these requests can stack up. We are not using
transactions, so imports can fail leaving incorrect records.

LZ

On Tue, Sep 19, 2017 at 8:12 AM, Ohad Levy <[email protected]> wrote:
>
>
> On Fri, Sep 15, 2017 at 8:56 PM, 'Konstantin Orekhov' via Foreman users
> <[email protected]> wrote:
>>
>>
>>>
>>> what kind of load do you have? Puppet? Facter? Is that ENC? Something
>>> else?
>>>
>>> Can you tell which requests are slow from logs or monitoring?
>>>
>>
>> Yes, I should have mentioned that - there's very little puppet and ENC
>> work done by this cluster at this point (more is coming soon though). Host
>> discovery is by far the largest workload - 7600 discovered systems at this
>> point. The last spike that I saw the impact to overall flows was when
>> 300-400 systems were trying to register at the same time. Because of the
>> deadlocks, about 200-300 systems could not register repeatedly and had to
>> keep retrying for a rather long time.
>> Rather often these registration attempts would end up creating either
>> duplicate entries with the same "mac<mac>" but different IDs in a DB or an
>> "empty" discovery host entry. Both of these would prevent a system
>> successfully register unless they are removed (I had to write a little
>> script that runs from the cron to do so). Here are the examples of an
>> "empty" record and duplicate ones (as they get deleted):
>>
>
> Lukas - how about we change discovery to be async? e.g. import all new
> discovered systems into active job and than process than one / multiple at a
> time? I assume this would require a image change too (so it knows when the
> discovery"job" is done)
>>
>> {
>>     "id": 437923,
>>     "name": "mac90e2bae6cc70",
>>     "last_compile": null,
>>     "last_report": null,
>>     "updated_at": "2017-08-22T07:08:54.000Z",
>>     "created_at": "2017-08-22T07:08:54.000Z",
>>     "root_pass": "<removed>",
>>     "architecture_id": null,
>>     "operatingsystem_id": null,
>>     "environment_id": null,
>>     "ptable_id": null,
>>     "medium_id": null,
>>     "build": false,
>>     "comment": null,
>>     "disk": null,
>>     "installed_at": null,
>>     "model_id": null,
>>     "hostgroup_id": null,
>>     "owner_id": null,
>>     "owner_type": null,
>>     "enabled": true,
>>     "puppet_ca_proxy_id": null,
>>     "managed": false,
>>     "use_image": null,
>>     "image_file": null,
>>     "uuid": null,
>>     "compute_resource_id": null,
>>     "puppet_proxy_id": null,
>>     "certname": null,
>>     "image_id": null,
>>     "organization_id": null,
>>     "location_id": null,
>>     "otp": null,
>>     "realm_id": null,
>>     "compute_profile_id": null,
>>     "provision_method": null,
>>     "grub_pass": "",
>>     "global_status": 0,
>>     "lookup_value_matcher": null,
>>     "discovery_rule_id": null,
>>     "salt_proxy_id": null,
>>     "salt_environment_id": null,
>>     "pxe_loader": null
>> }
>>
>> Duplicates (usually the later duplicate would be an empty one as well, but
>> not all the time):
>>
>> {
>>   "id": 430090,
>>   "name": "mac3417ebe3f8f1",
>>   "last_compile": null,
>>   "last_report": "2017-09-14T19:47:55.000Z",
>>   "updated_at": "2017-09-14T19:47:57.000Z",
>>   "created_at": "2017-03-08T20:24:05.000Z",
>>   "root_pass": "<removed>",
>>   "architecture_id": null,
>>   "operatingsystem_id": null,
>>   "environment_id": null,
>>   "ptable_id": null,
>>   "medium_id": null,
>>   "build": false,
>>   "comment": null,
>>   "disk": null,
>>   "installed_at": null,
>>   "model_id": 3,
>>   "hostgroup_id": null,
>>   "owner_id": 10,
>>   "owner_type": "User",
>>   "enabled": true,
>>   "puppet_ca_proxy_id": null,
>>   "managed": false,
>>   "use_image": null,
>>   "image_file": null,
>>   "uuid": null,
>>   "compute_resource_id": null,
>>   "puppet_proxy_id": null,
>>   "certname": null,
>>   "image_id": null,
>>   "organization_id": null,
>>   "location_id": null,
>>   "otp": null,
>>   "realm_id": null,
>>   "compute_profile_id": null,
>>   "provision_method": null,
>>   "grub_pass": "",
>>   "global_status": 0,
>>   "lookup_value_matcher": null,
>>   "discovery_rule_id": null,
>>   "salt_proxy_id": null,
>>   "salt_environment_id": null,
>>   "pxe_loader": null
>> }
>> {
>>   "id": 438146,
>>   "name": "mac3417ebe3f8f1",
>>   "last_compile": null,
>>   "last_report": "2017-09-11T08:58:05.000Z",
>>   "updated_at": "2017-09-11T08:58:07.000Z",
>>   "created_at": "2017-08-24T19:44:23.000Z",
>>   "root_pass": "<removed>",
>>   "architecture_id": null,
>>   "operatingsystem_id": null,
>>   "environment_id": null,
>>   "ptable_id": null,
>>   "medium_id": null,
>>   "build": false,
>>   "comment": null,
>>   "disk": null,
>>   "installed_at": null,
>>   "model_id": null,
>>   "hostgroup_id": null,
>>   "owner_id": null,
>>   "owner_type": null,
>>   "enabled": true,
>>   "puppet_ca_proxy_id": null,
>>   "managed": false,
>>   "use_image": null,
>>   "image_file": null,
>>   "uuid": null,
>>   "compute_resource_id": null,
>>   "puppet_proxy_id": null,
>>   "certname": null,
>>   "image_id": null,
>>   "organization_id": null,
>>   "location_id": null,
>>   "otp": null,
>>   "realm_id": null,
>>   "compute_profile_id": null,
>>   "provision_method": null,
>>   "grub_pass": "",
>>   "global_status": 0,
>>   "lookup_value_matcher": null,
>>   "discovery_rule_id": null,
>>   "salt_proxy_id": null,
>>   "salt_environment_id": null,
>>   "pxe_loader": null
>> }
>>
>> I can't tell if any queries are slow - can you remind me how to do that?
>> Thanks!
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Foreman users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/foreman-users.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Foreman users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/foreman-users.
> For more options, visit https://groups.google.com/d/optout.



-- 
Later,
  Lukas @lzap Zapletal

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.

Re: [foreman-users] Lots of "Mysql2::Error: Deadlock found when trying to get lock" under increased load

Reply via email to