Hi Mark, On Tue, Aug 5, 2025 at 7:29 PM Mark Wielaard <m...@klomp.org> wrote: > > On Mon, Aug 04, 2025 at 11:20:54PM -0400, Aaron Merey wrote: > > Implement concurrent execution of print_debug_* functions during handling > > of -w, --debug-dump using libthread.a. > > > > A new `-C, --concurrency=NUM` command line option controls the maximum > > number of threads that may be used. This value defaults to the number of > > CPUs. > > > > Job output is buffered and printed in the order that jobs were added to > > the queue. This helps preserve the existing order of stdout output. > > > > * src/readelf.c (default_concurrency): Function estimating the > > maximum number of threads. > > (parse_opt): Handle -C, --concurrency=NUM. > > (do_job): Entry point function for threads. > > (schedule_job): If thread safety is enabled, add job to the > > job queue. Otherwise just run the job from the main thread. > > (print_debug): Pass print_debug_* function pointers and > > args to schedule_job. Also call run_jobs if thread safety > > is enabled. > > > > Signed-off-by: Aaron Merey <ame...@redhat.com> > > --- > > v5: use 1 as a minimum default number of threads instead of 2. > > OK, so if sched_getaffinity fails we are just single threaded. > > > Restore static qualifier for sort_listptr_name > > Ack. > > > On Wed, Jul 2, 2025 at 5:21 PM Mark Wielaard <m...@klomp.org> wrote: > > > > > > + unsigned fn = 0; > > > > > > +#ifdef HAVE_GETRLIMIT > > > > > > + { > > > > > > + struct rlimit rlim; > > > > > > + int rc = getrlimit (RLIMIT_NOFILE, &rlim); > > > > > > + if (rc == 0) > > > > > > + fn = MAX ((rlim_t) 1, (rlim.rlim_cur - 100) / 2); > > > > > > + /* Conservatively estimate that at least 2 fds are used > > > > > > + by each thread. */ > > > > > > + } > > > > > > +#endif > > > > > > > > > > Interesting. I wouldn't have thought about that. But makes sense. If > > > > > you check available file descriptors, would it also make sense to > > > > > check for available memory? And limit the number of jobs to mem/1024M? > > > > > > > > This function is copied from debuginfod.cxx where we haven't run into > > > > issues with >1024 cores or the number of jobs exceeding mem/1024M. > > > > > > But is that because it was never run on systems with >1024 core > > I don't want to hold up the patch for this, but I still not really > understand why sched_getaffinity is used and what really happens for > really large core counts. Wouldn't it be simpler to use get_nprocs () > or sysconf (_SC_NPROCESSORS_ONLN)?
The decision to use sched_getaffinity in debuginfod.cxx default_concurrency() is due to https://sourceware.org/bugzilla/show_bug.cgi?id=29975, where std::thread::hardware_concurrency() reports the total number of processors instead of the number of processors actually available to debuginfod. The reproducer given in the bug report uses `taskset -c 2 debuginfod -v` on a system with 256 cores and observes debuginfod setting its max concurrency to 256. > > > > or with low memory? > > > > This is tricky because per-thread memory usage is very hard to predict > > since memory usage is not directly correlated with thread count. The > > memory footprint across 64 threads could far less than 1GB, while 2 > > threads in extreme cases might buffer many GB of output. > > > > I also have concerns about whether the memory that's available at the > > time of check ends up being a reliable predictor of real per-thread > > capacity, given OS-level memory overcommit and caching. > > > > The user does have the option to run eu-readelf in single thread > > mode (--concurrency=1) which does not buffer output and reverts to > > current memory usage patterns. > > OK, but it doesn't have to be 1GB, it could be 256M. It would be nice > to have some kind of memory limit, but I guess it only really matters > for huge core counts. Ok in a follow-up patch I will include a per-thread estimate of 256M or so to cap the number of threads on systems with high core counts but relatively low memory. > > > > If it is common code, maybe it should be part of threadlib that > > > debuginfod then also uses? > > > > There are some difference between debuginfod default_concurrency > > and eu-readelf's. Debuginfod checks std::thread::hardware_concurrency > > which isn't available in eu-readelf, plus it uses a different estimate > > for number of file descriptors per thread. These differences could be > > parameterized but the common code is quite straightforward. > > In practice std::thread::hardware_concurrency is get_nprocs. I still > think it would to have some common (parameterized, on fds, mem, etc) > code here, because I don't think the code is that straightforward. But > that could be a followup patch. In a follow-up patch I will refactor default_concurrency so it can be shared by both debuginfod and eu-readelf. Aaron