It was happening before 1.1.0, but i did not give it attention, seen
it in several situations which all I unfortunate cannot recall (see
the retries as example https://github.com/icing/mod_md/issues/52 and
https://github.com/icing/mod_md/issues/62 ).
It is a more serious issue then I thought before.
I think we must first fix this, otherwise it is a bad introduction to
our users. This because Windows community first-time users learned
that they are dealing with it and are dealing with all kind of (try)
errors, most users stopped using it. As said in an other post mod_md
is not that easy to start with.
Also when the loglevel is on the default Warn, users see hardly what
is happening. I advise our users to use LogLevel info md:trace2
ssl:notice
The Endless Retry loop Tested now in the following situations, tested
during renew and no new certificate is generated, httpd running fine
with the old certificate which was still valid.
1 - Mis-configuration like below.
2 - ACME CA service down (cause Letsencrypt down)
3 - ACME CA service not reachable (cause local network, or OS
failure/misconfig)
4 - Error response (Get/Post errors)when accessing Letsencrypt,
dependency issue like curl, mod_ssl.
5 - mod_md/mod_ssl faults
6 - Should be more
2) 3) Both can be that Letsencrypt is temp down maybe retry there, but
hard to tell if the cause is temp LE-Down, issue local or OS
misconfig.
4) Is a good example: Error response from LE, which happens quite
some situations, Curl issues, Rate-Limits, mod_md faults etc.
Below I introduced a Curl issue:
...
[md:debug] [pid 7508:tid 1052] mod_md.c(762): AH10055: md watchdog
run, auto drive 2 mds
[md:debug] [pid 7508:tid 1052] mod_md.c(691): AH10052:
md(apachelounge.nl): state=2, driving
[md:debug] [pid 7508:tid 1052] md_reg.c(884): apachelounge.nl: run
staging
[md:debug] [pid 7508:tid 1052] md_acme_drive.c(690): apachelounge.nl:
staging started, state=2, can_http=0, can_https=1,
challenges='tls-sni-01'
[md:debug] [pid 7508:tid 1052] md_store_fs.c(690): purge
staging/apachelounge.nl
(D:/servers/apacheS/md/staging/apachelounge.nl)
[md:debug] [pid 7508:tid 1052] md_acme.c(144): get directory from
https://acme-v01.api.letsencrypt.org/directory
[md:debug] [pid 7508:tid 1052] md_acme.c(407): req: POST
https://acme-v01.api.letsencrypt.org/directory
[md:debug] [pid 7508:tid 1052] md_curl.c(258): (20014)Internal error
(specific information not available): request 10 failed(60): Peer
certificate cannot be authenticated with given CA certificates
[md:debug] [pid 7508:tid 1052] md_acme.c(425): (20014)Internal error
(specific information not available): req sent
[md:error] [pid 7508:tid 1052] (20014)Internal error (specific
information not available): apachelounge.nl: setup
ACME(https://acme-v01.api.letsencrypt.org/directory)
[md:debug] [pid 7508:tid 1052] md_acme_drive.c(912): (20014)Internal
error (specific information not available): apachelounge.nl: ACME,
ACME staging
[md:debug] [pid 7508:tid 1052] md_reg.c(891): (20014)Internal error
(specific information not available): apachelounge.nl: staging done
[md:error] [pid 7508:tid 1052] (20014)Internal error (specific
information not available): AH10056: processing apachelounge.nl
[md:info] [pid 7508:tid 1052] AH10057: apachelounge.nl: encountered
error for the 6. time, next run in 0:02:40 hours
...
Maybe a little solution: starting httpd, mod_md checks if LE is
reachable without error.
And a solution for the below one can be: make a check that 443 and/or
80 is used.
Still my questions:
Does the retry stop ?
When does it happen, on what errors ?
Steffen
On Tuesday 12/12/2017 at 10:18, Stefan Eissing wrote:
Can you switch to "LogLevel md:debug" for a while and send me the
details? Did this start on the v1.1.0 or before that?
Am 11.12.2017 um 16:09 schrieb Steffen <i...@apachelounge.com>:
Running 1.1.0 with the new naming.
When mod_md encounters an error it looks like it is going in a endless
loop:
[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
error for the 1. time, next run in 0:00:05 hours
[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
error for the 2. time, next run in 0:00:10 hours
[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
error for the 3. time, next run in 0:00:20 hours
[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
error for the 4. time, next run in 0:00:40 hours
[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
error for the 5. time, next run in 0:01:20 hours
[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
error for the 6. time, next run in 0:02:40 hours
[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
error for the 7. time, next run in 0:05:20 hours
[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encountered
error for the 8. time, next run in 0:10:40 hours
...
...
...
Above is during renew and using port 444..
Apache is running fine because the certificate is still valid.
Does it stop ?
When does it happen, on what errors ? Above happens when:
(20014)Internal error (specific information not available): AH10056:
processing apachelounge.nl.
What to do. Stopping on above retries can be tricky because when the
ACME CA service is temp down or not reachable we do want maybe a
retry. A reachable error/down error is different then a configuration
error causing it like in above case..