Hi Mark,

On Tue, Aug 5, 2025 at 7:29 PM Mark Wielaard <m...@klomp.org> wrote:
>
> On Mon, Aug 04, 2025 at 11:20:54PM -0400, Aaron Merey wrote:
> > Implement concurrent execution of print_debug_* functions during handling
> > of -w, --debug-dump using libthread.a.
> >
> > A new `-C, --concurrency=NUM` command line option controls the maximum
> > number of threads that may be used. This value defaults to the number of 
> > CPUs.
> >
> > Job output is buffered and printed in the order that jobs were added to
> > the queue. This helps preserve the existing order of stdout output.
> >
> >       * src/readelf.c (default_concurrency): Function estimating the
> >       maximum number of threads.
> >       (parse_opt): Handle -C, --concurrency=NUM.
> >       (do_job): Entry point function for threads.
> >       (schedule_job): If thread safety is enabled, add job to the
> >       job queue.  Otherwise just run the job from the main thread.
> >       (print_debug): Pass print_debug_* function pointers and
> >       args to schedule_job. Also call run_jobs if thread safety
> >       is enabled.
> >
> > Signed-off-by: Aaron Merey <ame...@redhat.com>
> > ---
> > v5: use 1 as a minimum default number of threads instead of 2.
>
> OK, so if sched_getaffinity fails we are just single threaded.
>
> > Restore static qualifier for sort_listptr_name
>
> Ack.
>
> > On Wed, Jul 2, 2025 at 5:21 PM Mark Wielaard <m...@klomp.org> wrote:
> > > > > > +  unsigned fn = 0;
> > > > > > +#ifdef HAVE_GETRLIMIT
> > > > > > +  {
> > > > > > +    struct rlimit rlim;
> > > > > > +    int rc = getrlimit (RLIMIT_NOFILE, &rlim);
> > > > > > +    if (rc == 0)
> > > > > > +      fn = MAX ((rlim_t) 1, (rlim.rlim_cur - 100) / 2);
> > > > > > +    /* Conservatively estimate that at least 2 fds are used
> > > > > > +       by each thread.  */
> > > > > > +  }
> > > > > > +#endif
> > > > >
> > > > > Interesting. I wouldn't have thought about that. But makes sense. If
> > > > > you check available file descriptors, would it also make sense to
> > > > > check for available memory? And limit the number of jobs to mem/1024M?
> > > >
> > > > This function is copied from debuginfod.cxx where we haven't run into
> > > > issues with >1024 cores or the number of jobs exceeding mem/1024M.
> > >
> > > But is that because it was never run on systems with >1024 core
>
> I don't want to hold up the patch for this, but I still not really
> understand why sched_getaffinity is used and what really happens for
> really large core counts. Wouldn't it be simpler to use get_nprocs ()
> or sysconf (_SC_NPROCESSORS_ONLN)?

The decision to use sched_getaffinity in debuginfod.cxx
default_concurrency() is due to
https://sourceware.org/bugzilla/show_bug.cgi?id=29975, where
std::thread::hardware_concurrency() reports the total number of
processors instead of the number of processors actually available to
debuginfod. The reproducer given in the bug report uses `taskset -c 2
debuginfod -v` on a system with 256 cores and observes debuginfod
setting its max concurrency to 256.

>
> > > or  with low memory?
> >
> > This is tricky because per-thread memory usage is very hard to predict
> > since memory usage is not directly correlated with thread count.  The
> > memory footprint across 64 threads could far less than 1GB, while 2
> > threads in extreme cases might buffer many GB of output.
> >
> > I also have concerns about whether the memory that's available at the
> > time of check ends up being a reliable predictor of real per-thread
> > capacity, given OS-level memory overcommit and caching.
> >
> > The user does have the option to run eu-readelf in single thread
> > mode (--concurrency=1) which does not buffer output and reverts to
> > current memory usage patterns.
>
> OK, but it doesn't have to be 1GB, it could be 256M. It would be nice
> to have some kind of memory limit, but I guess it only really matters
> for huge core counts.

Ok in a follow-up patch I will include a per-thread estimate of 256M
or so to cap the number of threads on systems with high core counts
but relatively low memory.

>
> > > If it is common code, maybe it should be part of threadlib that
> > > debuginfod then also uses?
> >
> > There are some difference between debuginfod default_concurrency
> > and eu-readelf's. Debuginfod checks std::thread::hardware_concurrency
> > which isn't available in eu-readelf, plus it uses a different estimate
> > for number of file descriptors per thread.  These differences could be
> > parameterized but the common code is quite straightforward.
>
> In practice std::thread::hardware_concurrency is get_nprocs.  I still
> think it would to have some common (parameterized, on fds, mem, etc)
> code here, because I don't think the code is that straightforward. But
> that could be a followup patch.

In a follow-up patch I will refactor default_concurrency so it can be
shared by both debuginfod and eu-readelf.

Aaron

Reply via email to