Hi Kennon,

To make debugging easier and more informative with source and symbols install the cygwin-debuginfo package, similarly with any other library dependencies, or other package binaries.

On 2026-02-25 11:18, KENNON J CONRAD via Cygwin wrote:
new_score_rank is a local variable.  GDB only prints local variables.  No other 
thread can access this local variable.

Best Regards,

Kennon

On 02/25/2026 2:32 AM PST Duncan Roe via Cygwin <[email protected]> wrote:

Hi Kennon,

On Tue, Feb 24, 2026 at 10:38:01AM -0800, cygwin wrote:
Hello,

    I am having a problem with that is apparently related to memmove and 
looking for some advice on how to investigate further.  This winter I have been 
working to simplify GLZA source code and make it more readable.  GLZA is an 
advanced open source code straight line grammar compressor first released in 
2015.  Among these changes was replacing some rather bloated code with memmove 
and memset in various locations.  The program started crashing occassionally 
and after extensively reviewing the changes, I was unable to find a cause for 
these crashes.  So I installed gdb to try to find out what was going on and was 
apparently able to find the cause of the problem.  As a new gdb user, I am not 
very comfortable with trusting the results of what gdb showing, but it is 
pointing directly to one of the code changes I made.  I backed out of this code 
change and the program has not crashed after 3 days of nearly continuous 
testing.

    So here is what gdb reports when backtrace is run immediately after reporting a 
"SIGTRAP":

(gdb) bt full
#0 0x00007ff9dd8aa98b in KERNELBASE!DebugBreak () from 
/cygdrive/c/Windows/system32/KERNELBASE.dll
No symbol table info available.
#1 0x00007ff9ca3b6417 in cygwin1!.assert () from /cygdrive/c/Windows/cygwin1.dll
No symbol table info available.
#2 0x00007ff9ca3cfb18 in secure_getenv () from /cygdrive/c/Windows/cygwin1.dll
No symbol table info available.
#3 0x00007ff9e03dd82d in ntdll!.chkstk () from 
/cygdrive/c/Windows/SYSTEM32/ntdll.dll
No symbol table info available.
#4 0x00007ff9e038916b in ntdll!RtlRaiseException () from 
/cygdrive/c/Windows/SYSTEM32/ntdll.dll
No symbol table info available.
#5 0x00007ff9e03dc9ee in ntdll!KiUserExceptionDispatcher () from 
/cygdrive/c/Windows/SYSTEM32/ntdll.dll
No symbol table info available.
#6 0x00007ff9ca3b12a9 in memmove () from /cygdrive/c/Windows/cygwin1.dll
No symbol table info available.
#7 0x0000000100409a7c in rank_scores_thread (arg=0x6ffece890010) at 
GLZAcompress.c:904
new_score_rank = 2633
new_score_lmi2 = 183964750
new_score_pmi2 = 183964725
rank = 4380
max_rank = 2633
num_symbols = 25
new_score_lmi = 92079851
new_score_pmi = 92079826
thread_data_ptr = 0x6ffece890010
max_scores = 4883
candidates_index = 0xa00034470
score_index = 4380
node_score_num_symbols = 7
num_candidates = 4381
node_ptrs_num = 12224
local_write_index = 12225
rank_scores_buffer = 0x6ffece890020
candidates = 0x6ffece990020
score = 47.6283531
#8 0x00007ff9ca412eec in cygwin1!.getreent () from 
/cygdrive/c/Windows/cygwin1.dll
No symbol table info available.
#9 0x00007ff9ca3b47d3 in cygwin1!.assert () from /cygdrive/c/Windows/cygwin1.dll
No symbol table info available.
#10 0x0000000000000000 in ?? ()
No symbol table info available.

GLZAcompress.c line 904 is as follows and is in code that runs as a separate 
thread created in main:
memmove(&candidates_index[new_score_rank+1], &candidates_index[new_score_rank], 
2 * (rank - new_score_rank));
This does point directly to where a code change was made.

candidates_index is allocated in main and not ever intentionally changed until 
deallocated at the end of program execution:
if (0 == (candidates_index = (uint16_t *)malloc(max_scores * sizeof(uint16_t))))
   fprintf(stderr, "ERROR - memory allocation failed\n");
This value is passed to the thread in a structure pointed to by the thread arg. 
 The value 0xa00034470 for candidates_index is similar to what is reported on 
subsequent runs with added code to print this value so I don't think it's 
corrupted, but would need to duplicate the crash after checking the initial 
value to be 100% certain.  With gdb reporting that rank = 4380 and 
new_score_rank = 2633 at the time of the SIGTRAP, this should be a backward 
move of 1747 uint16_t values by 2 bytes with a 2 byte difference between the 
source and destination addresses.

Prior to this code change and for the last 3 days I have been using this code 
instead and not seen any crashes:
uint16_t * score_ptr = &candidates_index[new_score_rank];
uint16_t * candidate_ptr = &candidates_index[rank];
while (candidate_ptr >= score_ptr + 8) {
  *candidate_ptr = *(candidate_ptr - 1);
  *(candidate_ptr - 1) = *(candidate_ptr - 2);
  *(candidate_ptr - 2) = *(candidate_ptr - 3);
  *(candidate_ptr - 3) = *(candidate_ptr - 4);
  *(candidate_ptr - 4) = *(candidate_ptr - 5);
  *(candidate_ptr - 5) = *(candidate_ptr - 6);
  *(candidate_ptr - 6) = *(candidate_ptr - 7);
  *(candidate_ptr - 7) = *(candidate_ptr - 8);
  candidate_ptr -= 8;
}
while (candidate_ptr > score_ptr) {
  *candidate_ptr = *(candidate_ptr - 1);
  candidate_ptr--;
}
Yes, it's bloated code that should do the same thing as the memmove, but most 
importantly the code has never caused any problems.  Interestingly, even this 
code shows memmove in the assembly code (gcc -S), but only for the second while 
loop.  The looping code for the first while loop looks like this and moves 8 
uint16_t's in just 5 instruction so it is perhaps not as inefficient as the 
source code looks:
.L25:
movdqu -16(%rax), %xmm1
subq $16, %rax
movups %xmm1, 2(%rax)
cmpq %rdx, %rax
jnb .L25

It may or may not matter, but the code this is happening on is very CPU 
intensive - there can be up to 8 threads running at the same time when this 
problem occurs.  The problem doesn't occur consistently, it seems to be rather 
random.  The program runs about 500 iterations of ranking up to the top 30,000 
new grammar rule candidates over nearly 4 hours on my test case and has crashed 
on different iterations each time it has crashed, even though the thread that 
seems to be crashing should be seeing exactly the same data each time the 
program is run.  The malloc'ed array address could be changing, I haven't 
checked that out.

I find it really hard to believe there is a bug in memmove but that seems to be 
what gdb and my testing are indicating.  So I am looking for advice on how to 
better understand what is causing the program to crash.  I would like to review 
the code memset is using, but have not been able to figure out how to track 
that down.  Any help in understanding what code the complier is using for 
memmove would be helpful.  Are there other things I could possibly be 
overlooking?  Are the any other things I should review or report that would be 
helpful?  I could try to write a simplified test case if that would be useful.

Best Regards,

Kennon Conrad




--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

The memmove() call acceses new_score_rank 3 times while the old code only
accessed it once. Is it possible that another CPU alters new_score_rank between
these acesses?

You could eliminate that possibility by making a local copy of new_score_rank
and using that in the memmove() call. Worth a try?

Cheers ... Duncan.

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple



--
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retrancher  but when there is no more to cut
                                -- Antoine de Saint-Exupéry

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to