On Wed, Apr 1, 2020 at 6:54 PM Marcin Sobczyk <[email protected]> wrote: > > > > On 4/1/20 4:49 PM, Martin Necas wrote: > > Okay found the issue for some reason the ansible runner wants '\\n' instead > of '\n' I have not done anything in the patch with new line symbol so need to > do more investigating maybe something new in ansible-runner or > ansible-runner-service. > The fix in engine is simple in [1] update the line 131 to > `String.valueOf(e.getValue()).replaceAll("\n", "\\\\\\\\n")` but want to make > sure that it won't break anything else. > > Can you post a patch, even an unverified one yet, so I can quickly try it out?
Martin pushed: https://gerrit.ovirt.org/108142 Please verify. Thanks. > I want to test other changes, but this issue is currently blocking me. > > > Dne st 1. 4. 2020 14:41 uživatel Marcin Sobczyk <[email protected]> napsal: >> >> >> >> On 4/1/20 2:23 PM, Martin Necas wrote: >> >> It's possible that the issue was introduced in the patch [1], but as Arthurs >> logs showed properly formatted ovirt_ca_cert, so not sure with it. >> Arthur/Marcin could you please check command in >> ovirt-engine/share/ovirt-engine/ansible-runner-service-project/artifacts you >> should see there variables with which the ansible-playbook is executed. >> It should be same as you linked but still want to make sure that there isn't >> some issue. Also you can check stdout file if there is some issue. >> >> I tried changing the host deployment playbook to inject a debug message: >> >> - name: Add vdsm cacert files >> copy: >> content: "{{ ovirt_ca_cert }}" >> dest: "{{ filedest }}" >> owner: 'root' >> group: 'kvm' >> mode: 0644 >> with_items: >> - "{{ ovirt_vdsm_trust_store }}/{{ ovirt_vdsm_ca_file }}" >> - "{{ ovirt_vdsm_trust_store }}/{{ ovirt_vdsm_spice_ca_file }}" >> - "{{ ovirt_libvirt_default_trust_store }}/{{ >> ovirt_libvirt_default_client_ca_file }}" >> loop_control: >> loop_var: filedest >> >> - name: Show cacert >> debug: >> msg: CA contents 1987 {{ ovirt_ca_cert }} >> >> and the result was: >> >> 2020-04-01 06:02:23 EDT - TASK [ovirt-host-deploy-vdsm-certificates : Show >> cacert] *********************** >> 2020-04-01 06:02:23 EDT - ok: [lago-basic-suite-master-host-1] => { >> "msg": "CA contents 1987 -----BEGIN CERTIFICATE----- >> MIIDhDCCAmygAwIBAgICEAAwDQYJKoZIhvcNAQELBQAwMzELMAkGA1UEBhMCVVMxDTALBgNVBAoM >> BFRlc3QxFTATBgNVBAMMDGVuZ2luZS4yNDU1NTAeFw0yMDAzMzEwOTU0MjRaFw0zMDAzMzAwOTU0 >> MjRaMDMxCzAJBgNVBAYTAlVTMQ0wCwYDVQQKDARUZXN0MRUwEwYDVQQDDAxlbmdpbmUuMjQ1NTUw >> ggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC3c20WyiBD98u6Ty6Yjb48fx9wuYUp2MIK >> j7E8qlX9QvNgvuudTYugPf040xyi+pcVhbXjqc7PhJoqowzgYxuyBu7W/KZigAp2pWMl12w7J1J/ >> 3Hp2IXD5hM7M6aCQ1jMDLxt1YECZfw+TEFVep1z7oxGZHPRZM8MDvYdBje+oPj41kIL1XNsCOiTy >> J8auU5/eaFbZFjP/sCDNuN14MnmhJtlVahRouODt86N1DRf3ubkmV/Bcr/Xp4iLx4ycyFiPU31cu >> Gnb2x8pTMPIbgtMYJTqMnRVrzJPV+ALA/PCSOL6LKkM7Jy4ecVFcGcJfvFpmsvF+qd7NuCOfqA7u >> l6EnAgMBAAGjgaEwgZ4wHQYDVR0OBBYEFPK3q/RlmHfh5o0KmmTguIALVwFgMFwGA1UdIwRVMFOA >> FPK3q/RlmHfh5o0KmmTguIALVwFgoTekNTAzMQswCQYDVQQGEwJVUzENMAsGA1UECgwEVGVzdDEV >> MBMGA1UEAwwMZW5naW5lLjI0NTU1ggIQADAPBgNVHRMBAf8EBTADAQH/MA4GA1UdDwEB/wQEAwIB >> BjANBgkqhkiG9w0BAQsFAAOCAQEAaHQqbgeG7ReoodKwbmFOFq99YOMrYmLx2llt5s49wz+eZsMN >> OIja8Dilyhew+r6aM30cXHm6U8dOZpLQ9Ga0Y1hk4Edu6Vu4x51WXZdVTkxIjhD+DrHsuaM0PZsE >> s1tq+ngBaMFxSdXIWNf7DUEf9hymxfLDoOjjVfxxlFtaDsBmu1dup/N8shzUrZ+bTt8i7TGG/JWl >> F+Iyq/A1EHXywFwr/ZsEAeRjStFt0IytbYprGi98yt9LRZ4puDooio8PI57crON+Cu9vqHsYU3yc >> lj8vLtwcr354LlY+nLO+cnslhirZlhIuLtytDvBXA8bNJ3EdlAInCfr6SnXKC61aqA== >> -----END CERTIFICATE----- " >> >> So when running the playbook it's already broken. >> Artur OTOH checked the value of the variable by breaking in the engine code >> and it seemed ok there. >> Indeed I think there's a problem in [1]. >> >> [1] >> https://gerrit.ovirt.org/#/c/107683/5/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/common/utils/ansible/AnsibleRunnerHTTPClient.java >> >> >> Martin Necas >> >> >> On Wed, Apr 1, 2020 at 1:22 PM Artur Socha <[email protected]> wrote: >>> >>> Posting a public pastebin url [1]. Apologies for using the private one >>> before. >>> >>> [1] https://pastebin.com/wrw5ME7j >>> A. >>> >>> >>> On Wed, Apr 1, 2020 at 12:31 PM Artur Socha <[email protected]> wrote: >>> > >>> > Adding request content: >>> > http://pastebin.test.redhat.com/850652 >>> > >>> > A. >>> > >>> > On Wed, Apr 1, 2020 at 12:28 PM Artur Socha <[email protected]> wrote: >>> >> >>> >> I have debug the flow until the moment the request is being seng via >>> >> http client to ansible runner service and until that point it was >>> >> correct. The json did contain correctly formatted ovirt_ca_cert. >>> >> Artur >>> >> >>> >> On Wed, Apr 1, 2020 at 12:26 PM Marcin Sobczyk <[email protected]> >>> >> wrote: >>> >>> >>> >>> >>> >>> >>> >>> On 4/1/20 11:54 AM, Martin Perina wrote: >>> >>> >>> >>> >>> >>> >>> >>> On Wed, Apr 1, 2020 at 11:15 AM Marcin Sobczyk <[email protected]> >>> >>> wrote: >>> >>>> >>> >>>> >>> >>>> >>> >>>> On 4/1/20 11:06 AM, Marcin Sobczyk wrote: >>> >>>> > >>> >>>> > >>> >>>> > On 4/1/20 9:51 AM, Marcin Sobczyk wrote: >>> >>>> >> Hi, >>> >>>> >> >>> >>>> >> On 4/1/20 8:44 AM, Yedidyah Bar David wrote: >>> >>>> >>> On Wed, Apr 1, 2020 at 6:21 AM <[email protected]> >>> >>>> >>> wrote: >>> >>>> >>>> Project: >>> >>>> >>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/ >>> >>>> >>>> >>> >>>> >>>> Build: >>> >>>> >>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/ >>> >>>> >>> Previous build 1547 passed!, after many months of failing, thanks >>> >>>> >>> to >>> >>>> >>> Evgeny's work >>> >>>> >>> in recent weeks. Above one failed. >>> >>>> >>> I think the root cause is that the engine tried to connect to vdsm >>> >>>> >>> right after >>> >>>> >>> successfully finishing ansible host-deploy, but failed. vdsm.log >>> >>>> >>> has: >>> >>>> >>> >>> >>>> >>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/artifact/exported-artifacts/test_logs/he-basic-suite-master/post-he_deploy/lago-he-basic-suite-master-host-0/_var_log/vdsm/vdsm.log >>> >>>> >>> >>> >>>> >>> >>> >>>> >>> 2020-03-31 22:58:49,773-0400 ERROR (Reactor thread) >>> >>>> >>> [vds.dispatcher] >>> >>>> >>> uncaptured python exception, closing channel >>> >>>> >>> <yajsonrpc.betterAsyncore.Dispatcher connected >>> >>>> >>> ('::ffff:192.168.222.76', 46754, 0, 0) at 0x7f416c150a90> (<class >>> >>>> >>> 'ssl.SSLError'>:[X509] no certificate or crl found (_ssl.c:3771) >>> >>>> >>> [/usr/lib64/python3.6/asyncore.py|readwrite|110] >>> >>>> >>> [/usr/lib64/python3.6/asyncore.py|handle_write_event|442] >>> >>>> >>> [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|handle_write|74] >>> >>>> >>> >>> >>>> >>> [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|168] >>> >>>> >>> >>> >>>> >>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|handle_write|190] >>> >>>> >>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_handle_io|194] >>> >>>> >>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_set_up_socket|154]) >>> >>>> >>> (betterAsyncore:179) >>> >>>> >>> >>> >>>> >>> Not sure what might have caused this. Can anyone have a look? >>> >>>> >>> Thanks. >>> >>>> >> Probably caused by https://gerrit.ovirt.org/108016 >>> >>>> >> Looking into this. >>> >>>> >> >>> >>>> > Turns out that the patch is not the cause of the error per se - it >>> >>>> > simply >>> >>>> > uncovered a different problem - the CA on the hosts is broken: >>> >>>> > >>> >>>> > [root@lago-basic-suite-master-host-0 certs]# openssl x509 -in >>> >>>> > /etc/pki/vdsm/certs/cacert.pem -text >>> >>>> > unable to load certificate >>> >>>> > 139987452258112:error:0909006C:PEM routines:get_name:no start >>> >>>> > line:crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE >>> >>>> It looks like they have spaces instead of newlines. >>> >>>> When I manually replaced the spaces to newlines, openssl is able to >>> >>>> read >>> >>>> them. >>> >>> >>> >>> >>> >>> Martin/Dana, couldn't this be caused by any recent changes in >>> >>> ansible-runner integrations? >>> >>> >>> >>> This looks like a suspect to me: >>> >>> >>> >>> https://gerrit.ovirt.org/#/c/107683/5/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/common/utils/ansible/AnsibleRunnerHTTPClient.java >>> >>> >>> >>>> >>> >>>> > >>> >>>> >>> >>> >>>> >>>> Build Number: 1548 >>> >>>> >>>> Build Status: Failure >>> >>>> >>>> Triggered By: Started by timer >>> >>>> >>>> >>> >>>> >>>> ------------------------------------- >>> >>>> >>>> Changes Since Last Success: >>> >>>> >>>> ------------------------------------- >>> >>>> >>>> Changes for Build #1548 >>> >>>> >>>> [Galit Rosenthal] Fix the repo for suites that weren't moved to no >>> >>>> >>>> reposync >>> >>>> >>>> >>> >>>> >>>> >>> >>>> >>>> >>> >>>> >>>> >>> >>>> >>>> ----------------- >>> >>>> >>>> Failed Tests: >>> >>>> >>>> ----------------- >>> >>>> >>>> No tests ran. >>> >>>> >>> >>> >>>> >>> >>> >>>> >> >>> >>>> > >>> >>>> >>> >>> >>> >>> >>> >>> -- >>> >>> Martin Perina >>> >>> Manager, Software Engineering >>> >>> Red Hat Czech s.r.o. >>> >>> >>> >>> >>> >> >>> >> >>> >> -- >>> >> >>> >> Artur Socha >>> >> >>> >> Senior Software Engineer, RHV >>> >> >>> >> Red Hat >>> > >>> > >>> > >>> > -- >>> > >>> > Artur Socha >>> > >>> > Senior Software Engineer, RHV >>> > >>> > Red Hat >>> >>> >>> >>> -- >>> >>> Artur Socha >>> >>> Senior Software Engineer, RHV >>> >>> Red Hat >>> >> > -- Didi _______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/LERK4PPGXZ4CBPFLNM2GEAWY4D7ZGLGU/
