On 24 Feb 2026, at 19:38, KENNON J CONRAD via Cygwin <[email protected]> wrote: > > I am having a problem with that is apparently related to memmove and > looking for some advice on how to investigate further. This winter I have > been working to simplify GLZA source code and make it more readable. GLZA is > an advanced open source code straight line grammar compressor first released > in 2015. Among these changes was replacing some rather bloated code with > memmove and memset in various locations. The program started crashing > occassionally and after extensively reviewing the changes, I was unable to > find a cause for these crashes. So I installed gdb to try to find out what > was going on and was apparently able to find the cause of the problem. As a > new gdb user, I am not very comfortable with trusting the results of what gdb > showing, but it is pointing directly to one of the code changes I made. I > backed out of this code change and the program has not crashed after 3 days > of nearly continuous testing. > > So here is what gdb reports when backtrace is run immediately after > reporting a "SIGTRAP": > > (gdb) bt full > #0 0x00007ff9dd8aa98b in KERNELBASE!DebugBreak () from > /cygdrive/c/Windows/system32/KERNELBASE.dll > No symbol table info available. > #1 0x00007ff9ca3b6417 in cygwin1!.assert () from > /cygdrive/c/Windows/cygwin1.dll > No symbol table info available. > #2 0x00007ff9ca3cfb18 in secure_getenv () from /cygdrive/c/Windows/cygwin1.dll > No symbol table info available. > #3 0x00007ff9e03dd82d in ntdll!.chkstk () from > /cygdrive/c/Windows/SYSTEM32/ntdll.dll > No symbol table info available. > #4 0x00007ff9e038916b in ntdll!RtlRaiseException () from > /cygdrive/c/Windows/SYSTEM32/ntdll.dll > No symbol table info available. > #5 0x00007ff9e03dc9ee in ntdll!KiUserExceptionDispatcher () from > /cygdrive/c/Windows/SYSTEM32/ntdll.dll > No symbol table info available. > #6 0x00007ff9ca3b12a9 in memmove () from /cygdrive/c/Windows/cygwin1.dll > No symbol table info available. > #7 0x0000000100409a7c in rank_scores_thread (arg=0x6ffece890010) at > GLZAcompress.c:904 > new_score_rank = 2633 > new_score_lmi2 = 183964750 > new_score_pmi2 = 183964725 > rank = 4380 > max_rank = 2633 > num_symbols = 25 > new_score_lmi = 92079851 > new_score_pmi = 92079826 > thread_data_ptr = 0x6ffece890010 > max_scores = 4883 > candidates_index = 0xa00034470 > score_index = 4380 > node_score_num_symbols = 7 > num_candidates = 4381 > node_ptrs_num = 12224 > local_write_index = 12225 > rank_scores_buffer = 0x6ffece890020 > candidates = 0x6ffece990020 > score = 47.6283531 > #8 0x00007ff9ca412eec in cygwin1!.getreent () from > /cygdrive/c/Windows/cygwin1.dll > No symbol table info available. > #9 0x00007ff9ca3b47d3 in cygwin1!.assert () from > /cygdrive/c/Windows/cygwin1.dll > No symbol table info available. > #10 0x0000000000000000 in ?? () > No symbol table info available. > > GLZAcompress.c line 904 is as follows and is in code that runs as a separate > thread created in main: > memmove(&candidates_index[new_score_rank+1], > &candidates_index[new_score_rank], 2 * (rank - new_score_rank)); > This does point directly to where a code change was made. > > candidates_index is allocated in main and not ever intentionally changed > until deallocated at the end of program execution: > if (0 == (candidates_index = (uint16_t *)malloc(max_scores * > sizeof(uint16_t)))) > fprintf(stderr, "ERROR - memory allocation failed\n"); > This value is passed to the thread in a structure pointed to by the thread > arg. The value 0xa00034470 for candidates_index is similar to what is > reported on subsequent runs with added code to print this value so I don't > think it's corrupted, but would need to duplicate the crash after checking > the initial value to be 100% certain. With gdb reporting that rank = 4380 > and new_score_rank = 2633 at the time of the SIGTRAP, this should be a > backward move of 1747 uint16_t values by 2 bytes with a 2 byte difference > between the source and destination addresses. > > Prior to this code change and for the last 3 days I have been using this code > instead and not seen any crashes: > uint16_t * score_ptr = &candidates_index[new_score_rank]; > uint16_t * candidate_ptr = &candidates_index[rank]; > while (candidate_ptr >= score_ptr + 8) { > *candidate_ptr = *(candidate_ptr - 1); > *(candidate_ptr - 1) = *(candidate_ptr - 2); > *(candidate_ptr - 2) = *(candidate_ptr - 3); > *(candidate_ptr - 3) = *(candidate_ptr - 4); > *(candidate_ptr - 4) = *(candidate_ptr - 5); > *(candidate_ptr - 5) = *(candidate_ptr - 6); > *(candidate_ptr - 6) = *(candidate_ptr - 7); > *(candidate_ptr - 7) = *(candidate_ptr - 8); > candidate_ptr -= 8; > } > while (candidate_ptr > score_ptr) { > *candidate_ptr = *(candidate_ptr - 1); > candidate_ptr--; > } > Yes, it's bloated code that should do the same thing as the memmove, but most > importantly the code has never caused any problems. Interestingly, even this > code shows memmove in the assembly code (gcc -S), but only for the second > while loop. The looping code for the first while loop looks like this and > moves 8 uint16_t's in just 5 instruction so it is perhaps not as inefficient > as the source code looks: > .L25: > movdqu -16(%rax), %xmm1 > subq $16, %rax > movups %xmm1, 2(%rax) > cmpq %rdx, %rax > jnb .L25 > > It may or may not matter, but the code this is happening on is very CPU > intensive - there can be up to 8 threads running at the same time when this > problem occurs. The problem doesn't occur consistently, it seems to be > rather random. The program runs about 500 iterations of ranking up to the > top 30,000 new grammar rule candidates over nearly 4 hours on my test case > and has crashed on different iterations each time it has crashed, even though > the thread that seems to be crashing should be seeing exactly the same data > each time the program is run. The malloc'ed array address could be changing, > I haven't checked that out. > > I find it really hard to believe there is a bug in memmove but that seems to > be what gdb and my testing are indicating. So I am looking for advice on how > to better understand what is causing the program to crash. I would like to > review the code memset is using, but have not been able to figure out how to > track that down. Any help in understanding what code the complier is using > for memmove would be helpful. Are there other things I could possibly be > overlooking? Are the any other things I should review or report that would > be helpful? I could try to write a simplified test case if that would be > useful.
Is there any way you can use AddressSanitizer or UndefinedBehaviorSanitizer to double-check that you're not doing anything undefined? In this type of code it is very easy to miss small off-by-one errors and such. As far as I know, Cygwin's gcc does not have AddressSanitizer, but if you can compile the same code with Visual Studio, you can use it's AddressSanitizer. -Dimitry -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple

