Hi, We are running a foreman instance on Centos 7.2 which we are sending puppet reports to from some Puppet 3.6 masters. We have ~1100 hosts which run puppet every hour.
I upgraded our foreman instance from 1.11 yesterday to 1.12 and then 1.13. After the 1.13 upgrade, we had big problems with foreman. After running for some minutes (varied up to an hour), all Passenger workers were stuck at 100% CPU and ever increasing memory usage until either killed or OOM. I couldn't find anything in the logs to suggest what was causing this, but through strace I was able to tie it back to four hosts. These hosts have ~500 IP addresses assigned to them. We had problems with facter in the past where it would take too long to iterate the interfaces on these hosts, so we removed the interface fact rb scripts. A recent update to facter had put these back. When I looked in the foreman database, there were over 10,000 fact values relating to these hosts. I've removed the facter rb files from these hosts and deleted the associated db records and the foreman box has recovered back to its former happy self. I'm curious to know how I would go about debugging this further, or whether this is something that would be of interest to the foreman devs. I can reinstate those facts and gather sanitised data when the problem is occurring if required. I did enable debug logging in foreman but this didn't really help at all. strace was the only thing that helped. J -- You received this message because you are subscribed to the Google Groups "Foreman users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/foreman-users. For more options, visit https://groups.google.com/d/optout.
