On 2024-05-08 23:53:57 (+0800), Mark Millard wrote:

On Apr 29, 2024, at 20:16, Mark Millard <mark...@yahoo.com> wrote:

On Apr 29, 2024, at 20:11, Mark Millard <mark...@yahoo.com> wrote:

On Apr 29, 2024, at 19:54, Mark Millard <mark...@yahoo.com> wrote:

On Apr 28, 2024, at 18:06, Philip Paeps <phi...@freebsd.org> wrote:

On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:
On Apr 18, 2024, at 08:02, Mark Millard <mark...@yahoo.com> wrote:
void <void_at_f-m.fm> wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :

Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?

main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

[...]

My guess is that FreeBSD has something that broken after bd45bbe440 that was broken as of f5f08e41aa and was still broken at 75464941dc .


One thing of possible note:

Failing . . .

Host OSVERSION: 1500006
Jail OSVERSION: 1500014

I have finished a package builder refresh this morning. All our builder hosts (except PowerPC - I don't touch those) are now on main-n269671-feabaf8d5389 (OSVERSION 1500018).

ampere1 successfully finished its 140releng-armv7-quarterly build, so it looks like the problem with stuck builds was limited to ampere2 building main-armv7. I'll keep a close eye on this one when it starts its next build.


I see that main-armv7 started.

It queued only 31935 instead of the prior 34528 (or more): it is doing an incremental build instead of a full build. For example, pkg was not built but instead the prior build is in use. Thus bad results from the prior
build might be involved in this new build.

I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch
build for the purposes of the main-armv7 test.

Actually the test is not going to previde the information we are
after as things are.

giflib-5.2.2 failed to build, which leads to devel/doxygen being
skipped. devel/doxygen was the first one to hang up in the prior
2 failing attempts, if I remember right.

giflib-5.2.2 also causes graphics/graphviz to be skipped.
graphics/graphviz was installed just before the hangup in all of
the example hanups. So the context will not be replicated.

We need graphics/giflib to build to actually do the test.

Looks like:

https://cgit.freebsd.org/ports/commit/graphics/giflib?id=5007109903fc271e3ef0ba01d78781c1fed99f3f

is the fix for the graphic/giflib build failure.

Well, main-armv7 is building again and things are still
getting stuck. So much for my idea. For reference I
list the over 10-hr-so-far ones:

doxygen-1.9.6_1,2   build-depends 13:03:54
py39-pydot-2.0.0    run-depends   12:24:04
py39-pygraphviz-1.6 lib-depends   12:10:38

"ps -alxdww" would likely be appropriate to get a copy
of the otuput of.

"procstat -k -k" usage and the like on stuck processes
would probably be appropriate.

Does anyone with appropriate investigative background
have login access to ampere2 to take a look at what
is getting stuck?

This is unfortunate. I'm sure I have the appropriate background, but I'm spread very thin! I'll get as much information as I can about this machine while it's stuck, before I bounce it again.

I think it may be worth a try building those ports in isolation on ref14-aarch64, and see what they're trying to do. I'll also set up a set of refX-armv7 jails on that machine.

Hopefully we can get to the bottom of this soon. This is a very tedious failure mode.

We could also try to put an older armv7 image on the builder jail on ampere2. Depending on whether we have a sufficiently old image, that will either be very straightforward, or a very deep rabbit hole.

Thanks again for keeping an eye on this. We really should have better monitoring for stuck builds than "Mark will tell us". :-)

Philip

Reply via email to