Re: mod_md 1.1.0 repeating on error

Steffen Tue, 12 Dec 2017 05:18:07 -0800

To be clear : As I said the curl error I have introduced (by myself), so I know exactly what is wrong.

Your reply shows me that you want to keep the endless retry loop. Ithe worst case a user can end with a non working SSL because acertificate is not renewed.

Why is it retried again and again ? Looks all hard errors, exceptwhen LE is temporary down.

I think it should be fixed. No every one is constantly look at theerror.log.



What I like:

Use MDNotifyCmd for the first error AH10057 .

Now the MDNotifyCmd is only triggered when it is ok, seems logical toalso notify when there is some wrong.




On Tuesday 12/12/2017 at 13:58, Stefan Eissing  wrote:

Am 12.12.2017 um 13:47 schrieb Steffen <[email protected]>:
It was happening before 1.1.0, but i did not give it attention, seenit in several situations which all I unfortunate cannot recall (seethe retries as example https://github.com/icing/mod_md/issues/52andhttps://github.com/icing/mod_md/issues/62 ).
It is a more serious issue then I thought before.
I think we must first fix this, otherwise it is a bad introduction toour users. This because Windows community first-time users learnedthat they are dealing with it and are dealing with all kind of (try)errors, most users stopped using it. As said in an other post mod_mdis not that easy to start with.
Also when the loglevel is on the default Warn, users see hardly whatis happening. I advise our users to use LogLevel info md:trace2ssl:notice
The Endless Retry loop Tested now in the following situations, testedduring renew and no new certificate is generated, httpd running finewith the old certificate which was still valid.
1 - Mis-configuration like below.
2 - ACME CA service  down (cause Letsencrypt down)
3 - ACME CA service not reachable (cause local network, or OSfailure/misconfig)4 - Error response (Get/Post errors)when accessing Letsencrypt,dependency issue like curl, mod_ssl.
5 - mod_md/mod_ssl faults
6 - Should be more
2) 3) Both can be that Letsencrypt is temp down maybe retry there, buthard to tell if the cause is temp LE-Down, issue local or OSmisconfig.
4) Is a good example: Error response from LE, which happens quitesome situations, Curl issues, Rate-Limits, mod_md faults etc.
Below I introduced a Curl issue:

...
[md:debug] [pid 7508:tid 1052] mod_md.c(762): AH10055: md watchdogrun, auto drive 2 mds[md:debug] [pid 7508:tid 1052] mod_md.c(691): AH10052:md(apachelounge.nl): state=2, driving[md:debug] [pid 7508:tid 1052] md_reg.c(884): apachelounge.nl: runstaging[md:debug] [pid 7508:tid 1052] md_acme_drive.c(690): apachelounge.nl:staging started, state=2, can_http=0, can_https=1,challenges='tls-sni-01'[md:debug] [pid 7508:tid 1052] md_store_fs.c(690): purgestaging/apachelounge.nl(D:/servers/apacheS/md/staging/apachelounge.nl)[md:debug] [pid 7508:tid 1052] md_acme.c(144): get directory fromhttps://acme-v01.api.letsencrypt.org/directory[md:debug] [pid 7508:tid 1052] md_acme.c(407): req: POSThttps://acme-v01.api.letsencrypt.org/directory[md:debug] [pid 7508:tid 1052] md_curl.c(258): (20014)Internal error(specific information not available): request 10 failed(60): Peercertificate cannot be authenticated with given CA certificates
Ok, this needs to be logged at ERROR level, so users do not have tomess with LogLevel to see what is going on.
As for the reason, this seems to indicate that the curl client findsno way to verify the Let's Encrypt server certificate. Can you verifythat the "curl.exe" can connect to"https://acme-v01.api.letsencrypt.org/directory"; and retrieve the JSONthere *without* you giving it the '-k' or '--insecure' option? Andwhere does your curl.exe/libcurl come from? Did you build it yourself?
[md:debug] [pid 7508:tid 1052] md_acme.c(425): (20014)Internal error(specific information not available): req sent[md:error] [pid 7508:tid 1052] (20014)Internal error (specificinformation not available): apachelounge.nl: setupACME(https://acme-v01.api.letsencrypt.org/directory)[md:debug] [pid 7508:tid 1052] md_acme_drive.c(912): (20014)Internalerror (specific information not available): apachelounge.nl: ACME,ACME staging[md:debug] [pid 7508:tid 1052] md_reg.c(891): (20014)Internal error(specific information not available): apachelounge.nl: staging done[md:error] [pid 7508:tid 1052] (20014)Internal error (specificinformation not available): AH10056: processing apachelounge.nl[md:info] [pid 7508:tid 1052] AH10057: apachelounge.nl: encounterederror for the 6. time, next run in 0:02:40 hours
...
Maybe a little solution: starting httpd, mod_md checks if LE isreachable without error.
No, I think checking external servers on every httpd restart is a goodidea.
And a solution for the below one can be: make a check that 443 and/or80 is used.
Still my questions:

Does the retry stop ?
The retry does not stop, but it uses longer and longer retryintervals. Exactly to recover from errors with the ACME server thatare recoverable, e.g. server/internet down. Your local certificatestore not able to verify the LE server will not recover itself,however.
When does it happen, on what errors ?
On any error where signup/renew is necessary and could not complete.
Steffen


On Tuesday 12/12/2017 at 10:18, Stefan Eissing wrote:
Can you switch to "LogLevel md:debug" for a while and send me thedetails? Did this start on the v1.1.0 or before that?
Am 11.12.2017 um 16:09 schrieb Steffen <[email protected]>:


Running 1.1.0 with the new naming.
When mod_md encounters an error it looks like it is going in a endlessloop:
[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encounterederror for the 1. time, next run in 0:00:05 hours[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encounterederror for the 2. time, next run in 0:00:10 hours[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encounterederror for the 3. time, next run in 0:00:20 hours[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encounterederror for the 4. time, next run in 0:00:40 hours[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encounterederror for the 5. time, next run in 0:01:20 hours[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encounterederror for the 6. time, next run in 0:02:40 hours[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encounterederror for the 7. time, next run in 0:05:20 hours[md:info] [pid 10372:tid 1964] AH10057: apachelounge.nl: encounterederror for the 8. time, next run in 0:10:40 hours
...
...
...

Above is during renew and using port 444..

Apache is running fine because the certificate is still valid.

Does it stop ?
When does it happen, on what errors ? Above happens when:(20014)Internal error (specific information not available): AH10056:processing apachelounge.nl.
What to do. Stopping on above retries can be tricky because when theACME CA service is temp down or not reachable we do want maybe aretry. A reachable error/down error is different then a configurationerror causing it like in above case..

Re: mod_md 1.1.0 repeating on error

Reply via email to