Re: mod_md 1.1.0 repeating on error

Stefan Eissing Tue, 12 Dec 2017 05:22:52 -0800
*without* introducing new ones, I meant. Please provide a log.

> Am 12.12.2017 um 14:21 schrieb Stefan Eissing <[email protected]>:
> 
> 
> 
>> Am 12.12.2017 um 14:17 schrieb Steffen <[email protected]>:
>> 
>> To be clear :  As I said the curl error I have introduced (by my self), so I 
>> know exactly what is wrong.
> 
> Ah, that was not clear to me.
> 
> So, what is the error happening with you introducing new ones? Is there 
> nothing to see in the logs or did I miss it?
> 
>> Your reply shows me that you want to keep the endless retry loop. I the 
>> worst case a user can end with a non working SSL because a certificate is 
>> not renewed.
>> 
>> Why is it retried again and again ?  Looks all hard errors, except when LE 
>> is temporary down.
>> 
>> I think it should be fixed. No every one is constantly look at the error.log.
>> 
>> 
>> What I like:
>> 
>> Use MDNotifyCmd for the first error AH10057 . 
>> Now the MDNotifyCmd is only triggered when it is ok, seems logical to also 
>> notify when there is some wrong.
>> 
>> 
>> On Tuesday 12/12/2017 at 13:58, Stefan Eissing wrote: 
>>> 
>>> 
>>>> Am 12.12.2017 um 13:47 schrieb Steffen <[email protected]>:
>>>> 
>>>> It was happening before 1.1.0, but i did not give it attention, seen it in 
>>>> several situations which all I unfortunate cannot recall (see the retries 
>>>> as example https://github.com/icing/mod_md/issues/52and 
>>>> https://github.com/icing/mod_md/issues/62 ).
>>>> 
>>>> It is a more serious issue then I thought before. 
>>>> 
>>>> I think we must first fix this, otherwise it is a bad introduction to our 
>>>> users. This because Windows community first-time users learned that they 
>>>> are dealing with it and are dealing with all kind of (try) errors, most 
>>>> users stopped using it. As said in an other post mod_md is not that easy 
>>>> to start with.
>>>> 
>>>> Also when the loglevel is on the default Warn, users see hardly what is 
>>>> happening. I advise our users to use LogLevel info md:trace2 ssl:notice
>>>> 
>>>> The Endless Retry loop Tested now in the following situations, tested 
>>>> during renew and no new certificate is generated, httpd running fine with 
>>>> the old certificate which was still valid.
>>>> 
>>>> 1 - Mis-configuration like below.
>>>> 2 - ACME CA service down (cause Letsencrypt down)
>>>> 3 - ACME CA service not reachable (cause local network, or OS 
>>>> failure/misconfig)
>>>> 4 - Error response (Get/Post errors)when accessing Letsencrypt, dependency 
>>>> issue like curl, mod_ssl.
>>>> 5 - mod_md/mod_ssl faults
>>>> 6 - Should be more
>>>> 
>>>> 
>>>> 2) 3) Both can be that Letsencrypt is temp down maybe retry there, but 
>>>> hard to tell if the cause is temp LE-Down, issue local or OS misconfig.
>>>> 
>>>> 4) Is a good example: Error response from LE, which happens quite some 
>>>> situations, Curl issues, Rate-Limits, mod_md faults etc.
>>>> 
>>>> Below I introduced a Curl issue:
>>>> 
>>>> ...
>>>> [md:debug] [pid 7508:tid 1052] mod_md.c(762): AH10055: md watchdog run, 
>>>> auto drive 2 mds
>>>> [md:debug] [pid 7508:tid 1052] mod_md.c(691): AH10052: 
>>>> md(apachelounge.nl): state=2, driving
>>>> [md:debug] [pid 7508:tid 1052] md_reg.c(884): apachelounge.nl: run staging
>>>> [md:debug] [pid 7508:tid 1052] md_acme_drive.c(690): apachelounge.nl: 
>>>> staging started, state=2, can_http=0, can_https=1, challenges='tls-sni-01'
>>>> [md:debug] [pid 7508:tid 1052] md_store_fs.c(690): purge 
>>>> staging/apachelounge.nl (D:/servers/apacheS/md/staging/apachelounge.nl)
>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(144): get directory from 
>>>> https://acme-v01.api.letsencrypt.org/directory
>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(407): req: POST 
>>>> https://acme-v01.api.letsencrypt.org/directory
>>>> [md:debug] [pid 7508:tid 1052] md_curl.c(258): (20014)Internal error 
>>>> (specific information not available): request 10 failed(60): Peer 
>>>> certificate cannot be authenticated with given CA certificates
>>> 
>>> Ok, this needs to be logged at ERROR level, so users do not have to mess 
>>> with LogLevel to see what is going on.
>>> 
>>> As for the reason, this seems to indicate that the curl client finds no way 
>>> to verify the Let's Encrypt server certificate. Can you verify that the 
>>> "curl.exe" can connect to "https://acme-v01.api.letsencrypt.org/directory"; 
>>> and retrieve the JSON there *without* you giving it the '-k' or 
>>> '--insecure' option? And where does your curl.exe/libcurl come from? Did 
>>> you build it yourself?
>>> 
>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(425): (20014)Internal error 
>>>> (specific information not available): req sent
>>>> [md:error] [pid 7508:tid 1052] (20014)Internal error (specific information 
>>>> not available): apachelounge.nl: setup 
>>>> ACME(https://acme-v01.api.letsencrypt.org/directory)
>>>> [md:debug] [pid 7508:tid 1052] md_acme_drive.c(912): (20014)Internal error 
>>>> (specific information not available): apachelounge.nl: ACME, ACME staging
>>>> [md:debug] [pid 7508:tid 1052] md_reg.c(891): (20014)Internal error 
>>>> (specific information not available): apachelounge.nl: staging done
>>>> [md:error] [pid 7508:tid 1052] (20014)Internal error (specific information 
>>>> not available): AH10056: processing apachelounge.nl
>>>> [md:info] [pid 7508:tid 1052] AH10057: apachelounge.nl: encountered error 
>>>> for the 6. time, next run in 0:02:40 hours
>>>> ...
>>>> 
>>>> Maybe a little solution: starting httpd, mod_md checks if LE is reachable 
>>>> without error.
>>> 
>>> No, I think checking external servers on every httpd restart is a good idea.
>>> 
>>>> And a solution for the below one can be: make a check that 443 and/or 80 
>>>> is used.
>>>> 
>>>> Still my questions:
>>>> 
>>>> Does the retry stop ?
>>> 
>>> The retry does not stop, but it uses longer and longer retry intervals. 
>>> Exactly to recover from errors with the ACME server that are recoverable, 
>>> e.g. server/internet down. Your local certificate store not able to verify 
>>> the LE server will not recover itself, however.
>>> 
>>>> When does it happen, on what errors ?
>>> 
>>> On any error where signup/renew is necessary and could not complete.
>>> 
>>>> 
>>>> 
>>>> Steffen
>>>> 
>>>> 
>>>> On Tuesday 12/12/2017 at 10:18, Stefan Eissing wrote:
>>>>> Can you switch to "LogLevel md:debug" for a while and send me the 
>>>>> details? Did this start on the v1.1.0 or before that?
>>>>> 
>>>>>> Am 11.12.2017 um 16:09 schrieb Steffen <[email protected]>:
>>>>>> 
>>>>>> 
>>>>>> Running 1.1.0 with the new naming.
>>>>>> 
>>>>>> When mod_md encounters an error it looks like it is going in a endless 
>>>>>> loop:
>>>>>> 
>>>>>> 
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>> error for the 1. time, next run in 0:00:05 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>> error for the 2. time, next run in 0:00:10 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>> error for the 3. time, next run in 0:00:20 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>> error for the 4. time, next run in 0:00:40 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>> error for the 5. time, next run in 0:01:20 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>> error for the 6. time, next run in 0:02:40 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>> error for the 7. time, next run in 0:05:20 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered 
>>>>>> error for the 8. time, next run in 0:10:40 hours
>>>>>> ...
>>>>>> ...
>>>>>> ...
>>>>>> 
>>>>>> Above is during renew and using port 444..
>>>>>> 
>>>>>> Apache is running fine because the certificate is still valid.
>>>>>> 
>>>>>> Does it stop ?
>>>>>> 
>>>>>> When does it happen, on what errors ? Above happens when: 
>>>>>> (20014)Internal error (specific information not available): AH10056: 
>>>>>> processing apachelounge.nl.
>>>>>> 
>>>>>> What to do. Stopping on above retries can be tricky because when the 
>>>>>> ACME CA service is temp down or not reachable we do want maybe a retry. 
>>>>>> A reachable error/down error is different then a configuration error 
>>>>>> causing it like in above case..
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>
Re: mod_md 1.1.0 repeating on error

Reply via email to