On Wed, Apr 1, 2020 at 6:54 PM Marcin Sobczyk <[email protected]> wrote:
>
>
>
> On 4/1/20 4:49 PM, Martin Necas wrote:
>
> Okay found the issue for some reason the ansible runner wants '\\n' instead 
> of '\n' I have not done anything in the patch with new line symbol so need to 
> do more investigating maybe something new in ansible-runner or 
> ansible-runner-service.
> The fix in engine is simple in [1] update the line 131 to 
> `String.valueOf(e.getValue()).replaceAll("\n", "\\\\\\\\n")` but want to make 
> sure that it won't break anything else.
>
> Can you post a patch, even an unverified one yet, so I can quickly try it out?

Martin pushed: https://gerrit.ovirt.org/108142

Please verify. Thanks.

> I want to test other changes, but this issue is currently blocking me.
>
>
> Dne st 1. 4. 2020 14:41 uživatel Marcin Sobczyk <[email protected]> napsal:
>>
>>
>>
>> On 4/1/20 2:23 PM, Martin Necas wrote:
>>
>> It's possible that the issue was introduced in the patch [1], but as Arthurs 
>> logs showed properly formatted ovirt_ca_cert, so not sure with it.
>> Arthur/Marcin could you please check command in 
>> ovirt-engine/share/ovirt-engine/ansible-runner-service-project/artifacts you 
>> should see there variables with which the ansible-playbook is executed.
>> It should be same as you linked but still want to make sure that there isn't 
>> some issue. Also you can check stdout file if there is some issue.
>>
>> I tried changing the host deployment playbook to inject a debug message:
>>
>>   - name: Add vdsm cacert files
>>     copy:
>>       content: "{{ ovirt_ca_cert }}"
>>       dest: "{{ filedest }}"
>>       owner: 'root'
>>       group: 'kvm'
>>       mode: 0644
>>     with_items:
>>       - "{{ ovirt_vdsm_trust_store }}/{{ ovirt_vdsm_ca_file }}"
>>       - "{{ ovirt_vdsm_trust_store }}/{{ ovirt_vdsm_spice_ca_file }}"
>>       - "{{ ovirt_libvirt_default_trust_store }}/{{ 
>> ovirt_libvirt_default_client_ca_file }}"
>>     loop_control:
>>       loop_var: filedest
>>
>>   - name: Show cacert
>>     debug:
>>       msg: CA contents 1987 {{ ovirt_ca_cert }}
>>
>> and the result was:
>>
>> 2020-04-01 06:02:23 EDT - TASK [ovirt-host-deploy-vdsm-certificates : Show 
>> cacert] ***********************
>> 2020-04-01 06:02:23 EDT - ok: [lago-basic-suite-master-host-1] => {
>>     "msg": "CA contents 1987 -----BEGIN CERTIFICATE----- 
>> MIIDhDCCAmygAwIBAgICEAAwDQYJKoZIhvcNAQELBQAwMzELMAkGA1UEBhMCVVMxDTALBgNVBAoM 
>> BFRlc3QxFTATBgNVBAMMDGVuZ2luZS4yNDU1NTAeFw0yMDAzMzEwOTU0MjRaFw0zMDAzMzAwOTU0 
>> MjRaMDMxCzAJBgNVBAYTAlVTMQ0wCwYDVQQKDARUZXN0MRUwEwYDVQQDDAxlbmdpbmUuMjQ1NTUw 
>> ggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC3c20WyiBD98u6Ty6Yjb48fx9wuYUp2MIK 
>> j7E8qlX9QvNgvuudTYugPf040xyi+pcVhbXjqc7PhJoqowzgYxuyBu7W/KZigAp2pWMl12w7J1J/ 
>> 3Hp2IXD5hM7M6aCQ1jMDLxt1YECZfw+TEFVep1z7oxGZHPRZM8MDvYdBje+oPj41kIL1XNsCOiTy 
>> J8auU5/eaFbZFjP/sCDNuN14MnmhJtlVahRouODt86N1DRf3ubkmV/Bcr/Xp4iLx4ycyFiPU31cu 
>> Gnb2x8pTMPIbgtMYJTqMnRVrzJPV+ALA/PCSOL6LKkM7Jy4ecVFcGcJfvFpmsvF+qd7NuCOfqA7u 
>> l6EnAgMBAAGjgaEwgZ4wHQYDVR0OBBYEFPK3q/RlmHfh5o0KmmTguIALVwFgMFwGA1UdIwRVMFOA 
>> FPK3q/RlmHfh5o0KmmTguIALVwFgoTekNTAzMQswCQYDVQQGEwJVUzENMAsGA1UECgwEVGVzdDEV 
>> MBMGA1UEAwwMZW5naW5lLjI0NTU1ggIQADAPBgNVHRMBAf8EBTADAQH/MA4GA1UdDwEB/wQEAwIB 
>> BjANBgkqhkiG9w0BAQsFAAOCAQEAaHQqbgeG7ReoodKwbmFOFq99YOMrYmLx2llt5s49wz+eZsMN 
>> OIja8Dilyhew+r6aM30cXHm6U8dOZpLQ9Ga0Y1hk4Edu6Vu4x51WXZdVTkxIjhD+DrHsuaM0PZsE 
>> s1tq+ngBaMFxSdXIWNf7DUEf9hymxfLDoOjjVfxxlFtaDsBmu1dup/N8shzUrZ+bTt8i7TGG/JWl 
>> F+Iyq/A1EHXywFwr/ZsEAeRjStFt0IytbYprGi98yt9LRZ4puDooio8PI57crON+Cu9vqHsYU3yc 
>> lj8vLtwcr354LlY+nLO+cnslhirZlhIuLtytDvBXA8bNJ3EdlAInCfr6SnXKC61aqA== 
>> -----END CERTIFICATE----- "
>>
>> So when running the playbook it's already broken.
>> Artur OTOH checked the value of the variable by breaking in the engine code 
>> and it seemed ok there.
>> Indeed I think there's a problem in [1].
>>
>> [1] 
>> https://gerrit.ovirt.org/#/c/107683/5/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/common/utils/ansible/AnsibleRunnerHTTPClient.java
>>
>>
>> Martin Necas
>>
>>
>> On Wed, Apr 1, 2020 at 1:22 PM Artur Socha <[email protected]> wrote:
>>>
>>> Posting a public pastebin url [1]. Apologies for using the private one 
>>> before.
>>>
>>> [1] https://pastebin.com/wrw5ME7j
>>> A.
>>>
>>>
>>> On Wed, Apr 1, 2020 at 12:31 PM Artur Socha <[email protected]> wrote:
>>> >
>>> > Adding request content:
>>> > http://pastebin.test.redhat.com/850652
>>> >
>>> > A.
>>> >
>>> > On Wed, Apr 1, 2020 at 12:28 PM Artur Socha <[email protected]> wrote:
>>> >>
>>> >> I have debug the flow until the moment the request is being seng via 
>>> >> http client to ansible runner service and until that point it was 
>>> >> correct. The json did contain correctly formatted ovirt_ca_cert.
>>> >> Artur
>>> >>
>>> >> On Wed, Apr 1, 2020 at 12:26 PM Marcin Sobczyk <[email protected]> 
>>> >> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On 4/1/20 11:54 AM, Martin Perina wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Wed, Apr 1, 2020 at 11:15 AM Marcin Sobczyk <[email protected]> 
>>> >>> wrote:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On 4/1/20 11:06 AM, Marcin Sobczyk wrote:
>>> >>>> >
>>> >>>> >
>>> >>>> > On 4/1/20 9:51 AM, Marcin Sobczyk wrote:
>>> >>>> >> Hi,
>>> >>>> >>
>>> >>>> >> On 4/1/20 8:44 AM, Yedidyah Bar David wrote:
>>> >>>> >>> On Wed, Apr 1, 2020 at 6:21 AM <[email protected]> 
>>> >>>> >>> wrote:
>>> >>>> >>>> Project:
>>> >>>> >>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
>>> >>>> >>>>
>>> >>>> >>>> Build:
>>> >>>> >>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/
>>> >>>> >>> Previous build 1547 passed!, after many months of failing, thanks 
>>> >>>> >>> to
>>> >>>> >>> Evgeny's work
>>> >>>> >>> in recent weeks. Above one failed.
>>> >>>> >>> I think the root cause is that the engine tried to connect to vdsm
>>> >>>> >>> right after
>>> >>>> >>> successfully finishing ansible host-deploy, but failed. vdsm.log 
>>> >>>> >>> has:
>>> >>>> >>>
>>> >>>> >>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1548/artifact/exported-artifacts/test_logs/he-basic-suite-master/post-he_deploy/lago-he-basic-suite-master-host-0/_var_log/vdsm/vdsm.log
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>> 2020-03-31 22:58:49,773-0400 ERROR (Reactor thread) 
>>> >>>> >>> [vds.dispatcher]
>>> >>>> >>> uncaptured python exception, closing channel
>>> >>>> >>> <yajsonrpc.betterAsyncore.Dispatcher connected
>>> >>>> >>> ('::ffff:192.168.222.76', 46754, 0, 0) at 0x7f416c150a90> (<class
>>> >>>> >>> 'ssl.SSLError'>:[X509] no certificate or crl found (_ssl.c:3771)
>>> >>>> >>> [/usr/lib64/python3.6/asyncore.py|readwrite|110]
>>> >>>> >>> [/usr/lib64/python3.6/asyncore.py|handle_write_event|442]
>>> >>>> >>> [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|handle_write|74]
>>> >>>> >>>
>>> >>>> >>> [/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py|_delegate_call|168]
>>> >>>> >>>
>>> >>>> >>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|handle_write|190]
>>> >>>> >>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_handle_io|194]
>>> >>>> >>> [/usr/lib/python3.6/site-packages/vdsm/sslutils.py|_set_up_socket|154])
>>> >>>> >>> (betterAsyncore:179)
>>> >>>> >>>
>>> >>>> >>> Not sure what might have caused this. Can anyone have a look? 
>>> >>>> >>> Thanks.
>>> >>>> >> Probably caused by https://gerrit.ovirt.org/108016
>>> >>>> >> Looking into this.
>>> >>>> >>
>>> >>>> > Turns out that the patch is not the cause of the error per se - it 
>>> >>>> > simply
>>> >>>> > uncovered a different problem - the CA on the hosts is broken:
>>> >>>> >
>>> >>>> > [root@lago-basic-suite-master-host-0 certs]# openssl x509 -in
>>> >>>> > /etc/pki/vdsm/certs/cacert.pem -text
>>> >>>> > unable to load certificate
>>> >>>> > 139987452258112:error:0909006C:PEM routines:get_name:no start
>>> >>>> > line:crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE
>>> >>>> It looks like they have spaces instead of newlines.
>>> >>>> When I manually replaced the spaces to newlines, openssl is able to 
>>> >>>> read
>>> >>>> them.
>>> >>>
>>> >>>
>>> >>> Martin/Dana, couldn't this be caused by any recent changes in 
>>> >>> ansible-runner integrations?
>>> >>>
>>> >>> This looks like a suspect to me:
>>> >>>
>>> >>> https://gerrit.ovirt.org/#/c/107683/5/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/common/utils/ansible/AnsibleRunnerHTTPClient.java
>>> >>>
>>> >>>>
>>> >>>> >
>>> >>>> >>>
>>> >>>> >>>> Build Number: 1548
>>> >>>> >>>> Build Status:  Failure
>>> >>>> >>>> Triggered By: Started by timer
>>> >>>> >>>>
>>> >>>> >>>> -------------------------------------
>>> >>>> >>>> Changes Since Last Success:
>>> >>>> >>>> -------------------------------------
>>> >>>> >>>> Changes for Build #1548
>>> >>>> >>>> [Galit Rosenthal] Fix the repo for suites that weren't moved to no
>>> >>>> >>>> reposync
>>> >>>> >>>>
>>> >>>> >>>>
>>> >>>> >>>>
>>> >>>> >>>>
>>> >>>> >>>> -----------------
>>> >>>> >>>> Failed Tests:
>>> >>>> >>>> -----------------
>>> >>>> >>>> No tests ran.
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>
>>> >>>> >
>>> >>>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Martin Perina
>>> >>> Manager, Software Engineering
>>> >>> Red Hat Czech s.r.o.
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> Artur Socha
>>> >>
>>> >> Senior Software Engineer, RHV
>>> >>
>>> >> Red Hat
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > Artur Socha
>>> >
>>> > Senior Software Engineer, RHV
>>> >
>>> > Red Hat
>>>
>>>
>>>
>>> --
>>>
>>> Artur Socha
>>>
>>> Senior Software Engineer, RHV
>>>
>>> Red Hat
>>>
>>
>


-- 
Didi
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/LERK4PPGXZ4CBPFLNM2GEAWY4D7ZGLGU/

Reply via email to