*without* introducing new ones, I meant. Please provide a log.
> Am 12.12.2017 um 14:21 schrieb Stefan Eissing <stefan.eiss...@greenbytes.de>:
>
>
>
>> Am 12.12.2017 um 14:17 schrieb Steffen <i...@apachelounge.com>:
>>
>> To be clear : As I said the curl error I have introduced (by my self), so I
>> know exactly what is wrong.
>
> Ah, that was not clear to me.
>
> So, what is the error happening with you introducing new ones? Is there
> nothing to see in the logs or did I miss it?
>
>> Your reply shows me that you want to keep the endless retry loop. I the
>> worst case a user can end with a non working SSL because a certificate is
>> not renewed.
>>
>> Why is it retried again and again ? Looks all hard errors, except when LE
>> is temporary down.
>>
>> I think it should be fixed. No every one is constantly look at the error.log.
>>
>>
>> What I like:
>>
>> Use MDNotifyCmd for the first error AH10057 .
>> Now the MDNotifyCmd is only triggered when it is ok, seems logical to also
>> notify when there is some wrong.
>>
>>
>> On Tuesday 12/12/2017 at 13:58, Stefan Eissing wrote:
>>>
>>>
>>>> Am 12.12.2017 um 13:47 schrieb Steffen <i...@apachelounge.com>:
>>>>
>>>> It was happening before 1.1.0, but i did not give it attention, seen it in
>>>> several situations which all I unfortunate cannot recall (see the retries
>>>> as example https://github.com/icing/mod_md/issues/52and
>>>> https://github.com/icing/mod_md/issues/62 ).
>>>>
>>>> It is a more serious issue then I thought before.
>>>>
>>>> I think we must first fix this, otherwise it is a bad introduction to our
>>>> users. This because Windows community first-time users learned that they
>>>> are dealing with it and are dealing with all kind of (try) errors, most
>>>> users stopped using it. As said in an other post mod_md is not that easy
>>>> to start with.
>>>>
>>>> Also when the loglevel is on the default Warn, users see hardly what is
>>>> happening. I advise our users to use LogLevel info md:trace2 ssl:notice
>>>>
>>>> The Endless Retry loop Tested now in the following situations, tested
>>>> during renew and no new certificate is generated, httpd running fine with
>>>> the old certificate which was still valid.
>>>>
>>>> 1 - Mis-configuration like below.
>>>> 2 - ACME CA service down (cause Letsencrypt down)
>>>> 3 - ACME CA service not reachable (cause local network, or OS
>>>> failure/misconfig)
>>>> 4 - Error response (Get/Post errors)when accessing Letsencrypt, dependency
>>>> issue like curl, mod_ssl.
>>>> 5 - mod_md/mod_ssl faults
>>>> 6 - Should be more
>>>>
>>>>
>>>> 2) 3) Both can be that Letsencrypt is temp down maybe retry there, but
>>>> hard to tell if the cause is temp LE-Down, issue local or OS misconfig.
>>>>
>>>> 4) Is a good example: Error response from LE, which happens quite some
>>>> situations, Curl issues, Rate-Limits, mod_md faults etc.
>>>>
>>>> Below I introduced a Curl issue:
>>>>
>>>> ...
>>>> [md:debug] [pid 7508:tid 1052] mod_md.c(762): AH10055: md watchdog run,
>>>> auto drive 2 mds
>>>> [md:debug] [pid 7508:tid 1052] mod_md.c(691): AH10052:
>>>> md(apachelounge.nl): state=2, driving
>>>> [md:debug] [pid 7508:tid 1052] md_reg.c(884): apachelounge.nl: run staging
>>>> [md:debug] [pid 7508:tid 1052] md_acme_drive.c(690): apachelounge.nl:
>>>> staging started, state=2, can_http=0, can_https=1, challenges='tls-sni-01'
>>>> [md:debug] [pid 7508:tid 1052] md_store_fs.c(690): purge
>>>> staging/apachelounge.nl (D:/servers/apacheS/md/staging/apachelounge.nl)
>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(144): get directory from
>>>> https://acme-v01.api.letsencrypt.org/directory
>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(407): req: POST
>>>> https://acme-v01.api.letsencrypt.org/directory
>>>> [md:debug] [pid 7508:tid 1052] md_curl.c(258): (20014)Internal error
>>>> (specific information not available): request 10 failed(60): Peer
>>>> certificate cannot be authenticated with given CA certificates
>>>
>>> Ok, this needs to be logged at ERROR level, so users do not have to mess
>>> with LogLevel to see what is going on.
>>>
>>> As for the reason, this seems to indicate that the curl client finds no way
>>> to verify the Let's Encrypt server certificate. Can you verify that the
>>> "curl.exe" can connect to "https://acme-v01.api.letsencrypt.org/directory"
>>> and retrieve the JSON there *without* you giving it the '-k' or
>>> '--insecure' option? And where does your curl.exe/libcurl come from? Did
>>> you build it yourself?
>>>
>>>> [md:debug] [pid 7508:tid 1052] md_acme.c(425): (20014)Internal error
>>>> (specific information not available): req sent
>>>> [md:error] [pid 7508:tid 1052] (20014)Internal error (specific information
>>>> not available): apachelounge.nl: setup
>>>> ACME(https://acme-v01.api.letsencrypt.org/directory)
>>>> [md:debug] [pid 7508:tid 1052] md_acme_drive.c(912): (20014)Internal error
>>>> (specific information not available): apachelounge.nl: ACME, ACME staging
>>>> [md:debug] [pid 7508:tid 1052] md_reg.c(891): (20014)Internal error
>>>> (specific information not available): apachelounge.nl: staging done
>>>> [md:error] [pid 7508:tid 1052] (20014)Internal error (specific information
>>>> not available): AH10056: processing apachelounge.nl
>>>> [md:info] [pid 7508:tid 1052] AH10057: apachelounge.nl: encountered error
>>>> for the 6. time, next run in 0:02:40 hours
>>>> ...
>>>>
>>>> Maybe a little solution: starting httpd, mod_md checks if LE is reachable
>>>> without error.
>>>
>>> No, I think checking external servers on every httpd restart is a good idea.
>>>
>>>> And a solution for the below one can be: make a check that 443 and/or 80
>>>> is used.
>>>>
>>>> Still my questions:
>>>>
>>>> Does the retry stop ?
>>>
>>> The retry does not stop, but it uses longer and longer retry intervals.
>>> Exactly to recover from errors with the ACME server that are recoverable,
>>> e.g. server/internet down. Your local certificate store not able to verify
>>> the LE server will not recover itself, however.
>>>
>>>> When does it happen, on what errors ?
>>>
>>> On any error where signup/renew is necessary and could not complete.
>>>
>>>>
>>>>
>>>> Steffen
>>>>
>>>>
>>>> On Tuesday 12/12/2017 at 10:18, Stefan Eissing wrote:
>>>>> Can you switch to "LogLevel md:debug" for a while and send me the
>>>>> details? Did this start on the v1.1.0 or before that?
>>>>>
>>>>>> Am 11.12.2017 um 16:09 schrieb Steffen <i...@apachelounge.com>:
>>>>>>
>>>>>>
>>>>>> Running 1.1.0 with the new naming.
>>>>>>
>>>>>> When mod_md encounters an error it looks like it is going in a endless
>>>>>> loop:
>>>>>>
>>>>>>
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
>>>>>> error for the 1. time, next run in 0:00:05 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
>>>>>> error for the 2. time, next run in 0:00:10 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
>>>>>> error for the 3. time, next run in 0:00:20 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
>>>>>> error for the 4. time, next run in 0:00:40 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
>>>>>> error for the 5. time, next run in 0:01:20 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
>>>>>> error for the 6. time, next run in 0:02:40 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
>>>>>> error for the 7. time, next run in 0:05:20 hours
>>>>>> [md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
>>>>>> error for the 8. time, next run in 0:10:40 hours
>>>>>> ...
>>>>>> ...
>>>>>> ...
>>>>>>
>>>>>> Above is during renew and using port 444..
>>>>>>
>>>>>> Apache is running fine because the certificate is still valid.
>>>>>>
>>>>>> Does it stop ?
>>>>>>
>>>>>> When does it happen, on what errors ? Above happens when:
>>>>>> (20014)Internal error (specific information not available): AH10056:
>>>>>> processing apachelounge.nl.
>>>>>>
>>>>>> What to do. Stopping on above retries can be tricky because when the
>>>>>> ACME CA service is temp down or not reachable we do want maybe a retry.
>>>>>> A reachable error/down error is different then a configuration error
>>>>>> causing it like in above case..
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>