https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82366

Karol Zwolak <karolzwolak7 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |karolzwolak7 at gmail dot com

--- Comment #13 from Karol Zwolak <karolzwolak7 at gmail dot com> ---
I recently encountered a very similar crash when using `std::regex`. I'm not
sure whether this is a GCC/libstdc++ bug or just a consequence of inconsistent
compilation settings, but here is what I found.

The crash originates from code like this in `locale_classes.tcc` (these links
point to GCC 13, but the logic is similar in earlier versions):

https://github.com/gcc-mirror/gcc/blob/97454afb368f79783e99eafee009c88aa4e16845/libstdc++-v3/include/bits/locale_classes.tcc#L94-L95
https://github.com/gcc-mirror/gcc/blob/97454afb368f79783e99eafee009c88aa4e16845/libstdc++-v3/include/bits/locale_classes.tcc#L139-L140
https://github.com/gcc-mirror/gcc/blob/97454afb368f79783e99eafee009c88aa4e16845/libstdc++-v3/include/bits/locale_classes.tcc#L200-L203

Relevant excerpt:

```cpp
template<typename _Facet>
inline const _Facet* __try_use_facet(const locale& __loc) _GLIBCXX_NOTHROW {
  const size_t __i = _Facet::id._M_id();
  const locale::facet** __facets = __loc._M_impl->_M_facets;

  if (__i >= __loc._M_impl->_M_facets_size || !__facets[__i])
    return 0;
  return static_cast<const _Facet*>(__facets[__i]);
}

inline const _Facet& use_facet(const locale& __loc) {
  if (const _Facet* __f = std::__try_use_facet<_Facet>(__loc))
    return *__f;
  __throw_bad_cast();
}
```

In my case, the crash occurred because `use_facet` failed with `std::bad_cast`
due to a mismatch in the facet ID values between the binary and a shared
library.

Root Cause:

* Both the main binary and a shared library use components (such as
`std::regex`) that depend on `std::locale` facets.
* Facets are initialized once, and each has a globally unique ID (`_M_id`).
* If these components are not using the same symbols to get the id, they may
get different versions of the facet ID symbol (`std::ctype<char>::id`, for
example).
* In my case, the binary had its own copy of `std::ctype<char>::id` in the
`.bss` section (symbol type `B`), while the library referenced the symbol
dynamically (`U`, undefined).
* As a result, when the library tried to access the facet, it got an ID
(`_M_id`) that didn’t match the facet table, leading to a `std::bad_cast`.

Diagnosis:

You can check for inconsistent symbols like this:

```
nm -C yourlib.so | grep 'ctype<char>::id'
nm -C your_binary | grep 'ctype<char>::id'
```

Or, if your crash is in `collate<char>`:

```
nm -C yourlib.so | grep 'collate<char>::id'
nm -C your_binary | grep 'collate<char>::id'
```

If the symbol appears with `B` in one and `U` in the other, you may have this
mismatch. It’s also possible that both define the symbol, which can still lead
to divergent facet IDs.

You can also verify at runtime:

```cpp
// prefer to use printf while debugging it as using `std::cout` may use facets
under the hood as well
printf("ctype id: %ld\n", std::ctype<char>::id._M_id());
```

This value must be identical across all binaries and libraries. If it differs,
it means you have broken facet identity.

Here's the final version with everything integrated, including a note that you
couldn't reproduce the ID mismatch in a minimal example:


Possible fix:

To avoid this mismatch:

1. Ensure all components (binaries and shared libraries) that use `std::regex`,
`std::locale`, or locale facets are compiled with `-fPIC`.
2. Alternatively, statically link libstdc++ (if feasible), though this may
introduce other complications.

In my case, compiling the binary with -fPIC resolved the crash by ensuring
consistent symbol resolution. Without -fPIC, the binary can end up with its own
copy of static data symbols like std::ctype<char>::id, while shared libraries
expect to resolve those symbols dynamically via libstdc++. This discrepancy
causes the facet ID values (from id._M_id()) to differ between components,
leading to failures like std::bad_cast when std::use_facet is used.

It’s worth emphasizing that all components in my case were using the same
(dynamically linked) copy of libstdc++. The issue wasn’t caused by multiple
libstdc++ instances, but rather by symbol duplication and inconsistency in
initialization across non-PIC vs PIC-compiled code.

This workaround doesn’t address the root cause — the inconsistent facet ID
assignments — but it ensures that all parts of the system share the same symbol
and initialization state at runtime, effectively avoiding the crash.

Note: I wasn’t able to create a small standalone reproducer where facet IDs
differ. The issue only manifested in a larger setup with real binaries and
libraries. So while this solution is effective in practice, the underlying
mismatch may be subtle and depend on specific linker behavior or initialization
order.

Reply via email to