Assuming this happens really in thread start of a HTTP/2 worker, the following
change was made in Revision 1874909. The stacktrace indicates a 64 bit system.
Is someone making assumptions about connection->id content here? winnt mpm?
Another module that freaks out? Or do I just not see the problem...
--- httpd/httpd/branches/2.4.x/modules/http2/h2_task.c 2020/03/06 16:14:06
1874908
+++ httpd/httpd/branches/2.4.x/modules/http2/h2_task.c 2020/03/06 16:15:17
1874909
@@ -555,37 +555,36 @@ apr_status_t h2_task_do(h2_task *task, a
task->worker_started = 1;
if (c->master) {
- /* Each conn_rec->id is supposed to be unique at a point in time. Since
+ /* See the discussion at <https://github.com/icing/mod_h2/issues/195>
+ *
+ * Each conn_rec->id is supposed to be unique at a point in time. Since
* some modules (and maybe external code) uses this id as an identifier
* for the request_rec they handle, it needs to be unique for slave
* connections also.
- * The connection id is generated by the MPM and most MPMs use the
formula
- * id := (child_num * max_threads) + thread_num
- * which means that there is a maximum id of about
- * idmax := max_child_count * max_threads
- * If we assume 2024 child processes with 2048 threads max, we get
- * idmax ~= 2024 * 2048 = 2 ** 22
- * On 32 bit systems, we have not much space left, but on 64 bit
systems
- * (and higher?) we can use the upper 32 bits without fear of
collision.
- * 32 bits is just what we need, since a connection can only handle so
- * many streams.
+ *
+ * The MPM module assigns the connection ids and mod_unique_id is using
+ * that one to generate identifier for requests. While the
implementation
+ * works for HTTP/1.x, the parallel execution of several requests per
+ * connection will generate duplicate identifiers on load.
+ *
+ * The original implementation for slave connection identifiers used
+ * to shift the master connection id up and assign the stream id to
the
+ * lower bits. This was cramped on 32 bit systems, but on 64bit there
was
+ * enough space.
+ *
+ * As issue 195 showed, mod_unique_id only uses the lower 32 bit of the
+ * connection id, even on 64bit systems. Therefore collisions in
request ids.
+ *
+ * The way master connection ids are generated, there is some space
"at the
+ * top" of the lower 32 bits on allmost all systems. If you have a
setup
+ * with 64k threads per child and 255 child processes, you live on the
edge.
+ *
+ * The new implementation shifts 8 bits and XORs in the worker
+ * id. This will experience collisions with > 256 h2 workers and heavy
+ * load still. There seems to be no way to solve this in all possible
+ * configurations by mod_h2 alone.
*/
- int slave_id, free_bits;
-
- task->id = apr_psprintf(task->pool, "%ld-%d", c->master->id,
- task->stream_id);
- if (sizeof(unsigned long) >= 8) {
- free_bits = 32;
- slave_id = task->stream_id;
- }
- else {
- /* Assume we have a more limited number of threads/processes
- * and h2 workers on a 32-bit system. Use the worker instead
- * of the stream id. */
- free_bits = 8;
- slave_id = worker_id;
- }
- task->c->id = (c->master->id << free_bits)^slave_id;
+ task->c->id = (c->master->id << 8)^worker_id;
}
h2_beam_create(&task->output.beam, c->pool, task->stream_id, "output",
Stefan Eissing
<green/>bytes GmbH
Hafenweg 16
48155 Münster
www.greenbytes.de
> Am 14.04.2020 um 14:12 schrieb Eric Covener <[email protected]>:
>
> On Tue, Apr 14, 2020 at 8:09 AM Ruediger Pluem <[email protected]> wrote:
>>
>>
>>
>> On 4/14/20 12:22 PM, Steffen wrote:
>>>
>>>
>>> This is the post above of backtrace
>>
>> Thanks.
>>
>>>
>>> By accident I've seen that Perl comes with GDB. This might help as well.
>>> I called httpd.exe from GDB with "-X -e debug" and then called a Perl URL
>>> in the browser.
>>>
>>> Excerpt below:
>>>
>>
>> Somehow the below wasn't visible in the original mail.
>>
>>> Thread 100 received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 4936.0x23e0]
>>> 0x00007ffbe57515d9 in libhttpd!ap_get_server_built () from
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> (gdb) bt
>>> #0 0x00007ffbe57515d9 in libhttpd!ap_get_server_built () from
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #1 0x00007ffbe44d14aa in ?? () from X:\Apps\Apache24\modules\mod_cgi.so
>>> #2 0x00007ffbe575ee85 in libhttpd!ap_run_handler () from
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #3 0x00007ffbe575da7f in libhttpd!ap_invoke_handler () from
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #4 0x00007ffbe575a62a in libhttpd!ap_internal_redirect_handler () from
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #5 0x00007ffbe575a6af in libhttpd!ap_process_request () from
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #6 0x00007ffbe22888ef in ?? () from X:\Apps\Apache24\modules\mod_http2.so
>>> #7 0x00007ffbe5761545 in libhttpd!ap_run_process_connection () from
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #8 0x00007ffbe22885ba in ?? () from X:\Apps\Apache24\modules\mod_http2.so
>>> #9 0x00007ffbe228c36e in ?? () from X:\Apps\Apache24\modules\mod_http2.so
>>> #10 0x00007ffbe9e30e72 in ucrtbase!_beginthreadex () from
>>> C:\Windows\System32\ucrtbase.dll
>>> #11 0x00007ffbea107bd4 in KERNEL32!BaseThreadInitThunk () from
>>> C:\Windows\System32\kernel32.dll
>>> #12 0x00007ffbebecced1 in ntdll!RtlUserThreadStart () from
>>> C:\Windows\SYSTEM32\ntdll.dll
>>> #13 0x0000000000000000 in ?? ()
>>> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>>> (gdb)
>>>
>>
>>
>> Unfortunately this stacktrace does not help. One reason might be that the
>> debugging symbols are missing.
>> It is very strange that it segfaults in ap_get_server_built, a simple
>> function just returning a pointer
>> to a static string constant. Furthermore ap_get_server_built is not called
>> by mod_cgi.
>> Can the crash be repeated against a binary with debugging symbols that are
>> then used to generate the stacktrace?
>> As I am not a Windows guy, I unfortunately cannot provide any instructions
>> how to do this.
>
> My experience on windows is that if the PDB's are not 110% right you
> will get all kinds of misleading stuff above the first ?? in the
> displayed backtrace.