Re: Bug#993821: After upgrading libc, some services are unable to restart (including systemd-resolved)

2021-09-07 Thread Michael Hudson-Doyle
On Wed, 8 Sept 2021 at 07:04, Michael Biebl  wrote:

> Hi Aurelien
>
> Am 07.09.21 um 12:41 schrieb Aurelien Jarno:
> > Hi,
> >
> > On 2021-09-07 10:39, Michael Hudson-Doyle wrote:
>
> >> What's happening is that systemd is running with the old glibc, forks
> and
> >> then does NSS things that cause the new glibc's NSS modules to load and
> >> they don't necessarily work, leading to failures in any unit that
> specifies
> >> User=. At least for Ubuntu's builds the NSS modules seem to be ABI
> >> compatible between 2.32 and 2.33 (I didn't try 2.31 vs 2.32) but they
> are
> >> definitely not between 2.33 and 2.34.
> >
> > Thanks for this feedback and the pointer to the patch used in Ubuntu. It
> > seems to be a good solution, and matches what is done for other init
> > systems.
> >
> > On the other hand, the problem is supposed to only happen for major
> > glibc version upgrade where the NSS modules might have a different ABI.
> > In that regard, I would be tempted to restart it only for major versions
> > upgrade like it's done for other daemons. Now if the systemd maintainers
> > consider it's fine restarting it for each glibc upgrade, we should
> > probably go that way.
>
> I guess you are in a better position to make a judgement call here. If I
> read the glibc bug report correctly, there aren't strictly any
> guarantees regarding NSS modules. What that means for glibc minor
> updates, I'm not really in a position to tell.
>

I think in practice minor version updates are probably going to be fine
here, but also I think careful reexecing on every update is also likely to
be fine in practice.

If you wanted to be suuurrr paranoid, I guess you could embed in
the glibc postinst knowledge of which prior versions have binary-compatible
NSS modules but that seems like a lot of work for not much benefit (would
you only have to care about nss_files compatibility, or the full set?).


> Fwiw, I don't have a better proposal then Michael's patch he added to
> Ubuntu. We could run with that and if it causes problems, reiterate on it.
>

Yeah, the point where we start to offer updates to 21.10 will at the least
provide some data on how safe Ubuntu's approach is...

Cheers,
mwh


Re: Bug#993821: After upgrading libc, some services are unable to restart (including systemd-resolved)

2021-09-07 Thread Michael Biebl

Hi Aurelien

Am 07.09.21 um 12:41 schrieb Aurelien Jarno:

Hi,

On 2021-09-07 10:39, Michael Hudson-Doyle wrote:



What's happening is that systemd is running with the old glibc, forks and
then does NSS things that cause the new glibc's NSS modules to load and
they don't necessarily work, leading to failures in any unit that specifies
User=. At least for Ubuntu's builds the NSS modules seem to be ABI
compatible between 2.32 and 2.33 (I didn't try 2.31 vs 2.32) but they are
definitely not between 2.33 and 2.34.


Thanks for this feedback and the pointer to the patch used in Ubuntu. It
seems to be a good solution, and matches what is done for other init
systems.

On the other hand, the problem is supposed to only happen for major
glibc version upgrade where the NSS modules might have a different ABI.
In that regard, I would be tempted to restart it only for major versions
upgrade like it's done for other daemons. Now if the systemd maintainers
consider it's fine restarting it for each glibc upgrade, we should
probably go that way.


I guess you are in a better position to make a judgement call here. If I 
read the glibc bug report correctly, there aren't strictly any 
guarantees regarding NSS modules. What that means for glibc minor 
updates, I'm not really in a position to tell.


Fwiw, I don't have a better proposal then Michael's patch he added to 
Ubuntu. We could run with that and if it causes problems, reiterate on it.


Regards,
Michael



Re: Bug#993821: After upgrading libc, some services are unable to restart (including systemd-resolved)

2021-09-07 Thread Aurelien Jarno
Hi,

On 2021-09-07 10:39, Michael Hudson-Doyle wrote:
> On Tue, 7 Sept 2021 at 10:21, Michael Biebl  wrote:
> 
> > Am 06.09.21 um 23:45 schrieb Vincent Bernat:
> >  > Package: systemd
> >  > Version: 247.9-1
> >  > Severity: normal
> >  >
> > > Hey!
> > >
> > > After upgrading to libc6 2.32-1, some services are unable to restart.
> > > In my case, systemd-resolved, systemd-timesyncd and colord. Using
> > > "systemctl daemon-reexec" fixes the issue. Unsure if there is really
> > > something to be fixed but as I didn't find anything about that, a bug
> > > report may help others. I suppose the problem is related to NSS.
> > >
> > > Sep 06 23:06:43 chocobo systemd[1]: Starting Network Time
> > Synchronization...
> > > Sep 06 23:06:43 chocobo systemd[236983]: systemd-timesyncd.service:
> > Failed to determine user credentials: No such process
> > > Sep 06 23:06:43 chocobo systemd[236983]: systemd-timesyncd.service:
> > Failed at step USER spawning /lib/systemd/systemd-timesyncd: No such process
> > >
> > >
> >
> >
> > @libc maintainers: any ideas what could be causing this? If this is
> > triggered by a libc6 update, should this be reassigned to glibc?
> >
> 
> We went through this in Ubuntu recently and decided that restarting systemd
> in glibc's postinst was the safest option:
> https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1942276
> 
> What's happening is that systemd is running with the old glibc, forks and
> then does NSS things that cause the new glibc's NSS modules to load and
> they don't necessarily work, leading to failures in any unit that specifies
> User=. At least for Ubuntu's builds the NSS modules seem to be ABI
> compatible between 2.32 and 2.33 (I didn't try 2.31 vs 2.32) but they are
> definitely not between 2.33 and 2.34.

Thanks for this feedback and the pointer to the patch used in Ubuntu. It
seems to be a good solution, and matches what is done for other init
systems.

On the other hand, the problem is supposed to only happen for major
glibc version upgrade where the NSS modules might have a different ABI.
In that regard, I would be tempted to restart it only for major versions
upgrade like it's done for other daemons. Now if the systemd maintainers
consider it's fine restarting it for each glibc upgrade, we should
probably go that way.

Regards,
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Re: Bug#993821: After upgrading libc, some services are unable to restart (including systemd-resolved)

2021-09-07 Thread Michael Hudson-Doyle
On Tue, 7 Sept 2021 at 17:49, Michael Biebl  wrote:

> Control: reassign -1 libc6
> Control: found -1 2.32-1
> Control: severity -1 serious
> Control: affects -1 + systemd
>
> Hi Michael
>
> Am 07.09.21 um 00:39 schrieb Michael Hudson-Doyle:
> > On Tue, 7 Sept 2021 at 10:21, Michael Biebl  > > wrote:
> >
> > Am 06.09.21 um 23:45 schrieb Vincent Bernat:
> >   > Package: systemd
> >   > Version: 247.9-1
> >   > Severity: normal
> >   >
> >  > Hey!
> >  >
> >  > After upgrading to libc6 2.32-1, some services are unable to
> restart.
> >  > In my case, systemd-resolved, systemd-timesyncd and colord. Using
> >  > "systemctl daemon-reexec" fixes the issue. Unsure if there is
> really
> >  > something to be fixed but as I didn't find anything about that, a
> bug
> >  > report may help others. I suppose the problem is related to NSS.
> >  >
> >  > Sep 06 23:06:43 chocobo systemd[1]: Starting Network Time
> > Synchronization...
> >  > Sep 06 23:06:43 chocobo systemd[236983]:
> > systemd-timesyncd.service: Failed to determine user credentials: No
> > such process
> >  > Sep 06 23:06:43 chocobo systemd[236983]:
> > systemd-timesyncd.service: Failed at step USER spawning
> > /lib/systemd/systemd-timesyncd: No such process
> >  >
> >  >
> >
> >
> > @libc maintainers: any ideas what could be causing this? If this is
> > triggered by a libc6 update, should this be reassigned to glibc?
> >
> >
> > We went through this in Ubuntu recently and decided that restarting
> > systemd in glibc's postinst was the safest option:
> > https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1942276
> > 
> >
> > What's happening is that systemd is running with the old glibc, forks
> > and then does NSS things that cause the new glibc's NSS modules to load
> > and they don't necessarily work, leading to failures in any unit that
> > specifies User=. At least for Ubuntu's builds the NSS modules seem to be
> > ABI compatible between 2.32 and 2.33 (I didn't try 2.31 vs 2.32) but
> > they are definitely not between 2.33 and 2.34.
>
> Thanks for this information. This is indeed an icky issue and I feel
> like we are between a rock and a hard place.
>

Yeah. I guess one could say that having a long running process that forks
and then does NSS stuff is skating on thin ice a bit. At least the changes
in glibc 2.34 to move nss_files functionality into glibc itself will reduce
the fallout of this considerably.


> I'm not a huge fan of going back to re-exec systemd again directly in
> libc6.postinst, but your proposed patch to at least check that the
> systemd binary can be sucessfully executed should at least deal with the
> situation sufficiently, where a library is (temporarily) missing.
> I do wonder though, if this this will mean that on dist-upgrades the
> daemon-reexec will be skipped.
>

FWIW I had a long chat with Julian (the apt maintainer) about this and he
thought there were three potential situations that could be a problem:

1) a new systemd is unpacked before its Depends
2) one of systemd dependencies has a Breaks: systemd (<< new)
3) in some cases a cycle has to be broken by removing a package with
--force-deps

It think 1) is by some margin the most likely to actually happen, and at
least in that situation systemd will be restarted shortly by its own
postinst.

Cheers,
Michael

Anyway, I think it's best to reassign this libc6 for now and mark it as
> RC so the package doesn't migrate to testing for now.
>
> Regards,
> Michael
>
>


Processed: Re: Bug#993821: After upgrading libc, some services are unable to restart (including systemd-resolved)

2021-09-07 Thread Debian Bug Tracking System
Processing control commands:

> reassign -1 libc6
Bug #993821 [systemd] After upgrading libc, some services are unable to restart 
(including systemd-resolved)
Bug reassigned from package 'systemd' to 'libc6'.
No longer marked as found in versions systemd/247.9-1.
Ignoring request to alter fixed versions of bug #993821 to the same values 
previously set
> found -1 2.32-1
Bug #993821 [libc6] After upgrading libc, some services are unable to restart 
(including systemd-resolved)
Marked as found in versions glibc/2.32-1.
> severity -1 serious
Bug #993821 [libc6] After upgrading libc, some services are unable to restart 
(including systemd-resolved)
Severity set to 'serious' from 'normal'
> affects -1 + systemd
Bug #993821 [libc6] After upgrading libc, some services are unable to restart 
(including systemd-resolved)
Added indication that 993821 affects systemd

-- 
993821: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=993821
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Re: Bug#993821: After upgrading libc, some services are unable to restart (including systemd-resolved)

2021-09-07 Thread Michael Biebl

Control: reassign -1 libc6
Control: found -1 2.32-1
Control: severity -1 serious
Control: affects -1 + systemd

Hi Michael

Am 07.09.21 um 00:39 schrieb Michael Hudson-Doyle:
On Tue, 7 Sept 2021 at 10:21, Michael Biebl > wrote:


Am 06.09.21 um 23:45 schrieb Vincent Bernat:
  > Package: systemd
  > Version: 247.9-1
  > Severity: normal
  >
 > Hey!
 >
 > After upgrading to libc6 2.32-1, some services are unable to restart.
 > In my case, systemd-resolved, systemd-timesyncd and colord. Using
 > "systemctl daemon-reexec" fixes the issue. Unsure if there is really
 > something to be fixed but as I didn't find anything about that, a bug
 > report may help others. I suppose the problem is related to NSS.
 >
 > Sep 06 23:06:43 chocobo systemd[1]: Starting Network Time
Synchronization...
 > Sep 06 23:06:43 chocobo systemd[236983]:
systemd-timesyncd.service: Failed to determine user credentials: No
such process
 > Sep 06 23:06:43 chocobo systemd[236983]:
systemd-timesyncd.service: Failed at step USER spawning
/lib/systemd/systemd-timesyncd: No such process
 >
 >


@libc maintainers: any ideas what could be causing this? If this is
triggered by a libc6 update, should this be reassigned to glibc?


We went through this in Ubuntu recently and decided that restarting 
systemd in glibc's postinst was the safest option: 
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1942276 



What's happening is that systemd is running with the old glibc, forks 
and then does NSS things that cause the new glibc's NSS modules to load 
and they don't necessarily work, leading to failures in any unit that 
specifies User=. At least for Ubuntu's builds the NSS modules seem to be 
ABI compatible between 2.32 and 2.33 (I didn't try 2.31 vs 2.32) but 
they are definitely not between 2.33 and 2.34.


Thanks for this information. This is indeed an icky issue and I feel 
like we are between a rock and a hard place.
I'm not a huge fan of going back to re-exec systemd again directly in 
libc6.postinst, but your proposed patch to at least check that the 
systemd binary can be sucessfully executed should at least deal with the 
situation sufficiently, where a library is (temporarily) missing.
I do wonder though, if this this will mean that on dist-upgrades the 
daemon-reexec will be skipped.


Anyway, I think it's best to reassign this libc6 for now and mark it as 
RC so the package doesn't migrate to testing for now.


Regards,
Michael



OpenPGP_signature
Description: OpenPGP digital signature


Re: Bug#993821: After upgrading libc, some services are unable to restart (including systemd-resolved)

2021-09-06 Thread Michael Hudson-Doyle
On Tue, 7 Sept 2021 at 10:21, Michael Biebl  wrote:

> Am 06.09.21 um 23:45 schrieb Vincent Bernat:
>  > Package: systemd
>  > Version: 247.9-1
>  > Severity: normal
>  >
> > Hey!
> >
> > After upgrading to libc6 2.32-1, some services are unable to restart.
> > In my case, systemd-resolved, systemd-timesyncd and colord. Using
> > "systemctl daemon-reexec" fixes the issue. Unsure if there is really
> > something to be fixed but as I didn't find anything about that, a bug
> > report may help others. I suppose the problem is related to NSS.
> >
> > Sep 06 23:06:43 chocobo systemd[1]: Starting Network Time
> Synchronization...
> > Sep 06 23:06:43 chocobo systemd[236983]: systemd-timesyncd.service:
> Failed to determine user credentials: No such process
> > Sep 06 23:06:43 chocobo systemd[236983]: systemd-timesyncd.service:
> Failed at step USER spawning /lib/systemd/systemd-timesyncd: No such process
> >
> >
>
>
> @libc maintainers: any ideas what could be causing this? If this is
> triggered by a libc6 update, should this be reassigned to glibc?
>

We went through this in Ubuntu recently and decided that restarting systemd
in glibc's postinst was the safest option:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1942276

What's happening is that systemd is running with the old glibc, forks and
then does NSS things that cause the new glibc's NSS modules to load and
they don't necessarily work, leading to failures in any unit that specifies
User=. At least for Ubuntu's builds the NSS modules seem to be ABI
compatible between 2.32 and 2.33 (I didn't try 2.31 vs 2.32) but they are
definitely not between 2.33 and 2.34.

Cheers,
mwh


Re: Bug#993821: After upgrading libc, some services are unable to restart (including systemd-resolved)

2021-09-06 Thread Michael Biebl

Am 06.09.21 um 23:45 schrieb Vincent Bernat:
> Package: systemd
> Version: 247.9-1
> Severity: normal
>

Hey!

After upgrading to libc6 2.32-1, some services are unable to restart.
In my case, systemd-resolved, systemd-timesyncd and colord. Using
"systemctl daemon-reexec" fixes the issue. Unsure if there is really
something to be fixed but as I didn't find anything about that, a bug
report may help others. I suppose the problem is related to NSS.

Sep 06 23:06:43 chocobo systemd[1]: Starting Network Time Synchronization...
Sep 06 23:06:43 chocobo systemd[236983]: systemd-timesyncd.service: Failed to 
determine user credentials: No such process
Sep 06 23:06:43 chocobo systemd[236983]: systemd-timesyncd.service: Failed at 
step USER spawning /lib/systemd/systemd-timesyncd: No such process





@libc maintainers: any ideas what could be causing this? If this is 
triggered by a libc6 update, should this be reassigned to glibc?




OpenPGP_signature
Description: OpenPGP digital signature