Hi Brian,

   I installed the version of the cygwin-debuginfo that is on my test computer 
to the build machine (version 3.6.6-1).  I put the memmove back in the code in 
place of the bloated code that has been running the past 4 days without any 
problem (and the past ~10 years before changing to memset) and got another 
SIGTRAP in gbd on that memset within 2 hours.  The backtrace looks very similar:

#0  0x00007ff97e40a98b in KERNELBASE!DebugBreak () from 
/cygdrive/c/Windows/system32/KERNELBASE.dll
No symbol table info available.
#1  0x00007ff96ba86417 in cygwin1!.assert () from 
/cygdrive/c/Windows/cygwin1.dll
No symbol table info available.
#2  0x00007ff96ba9fb18 in secure_getenv () from /cygdrive/c/Windows/cygwin1.dll
No symbol table info available.
#3  0x00007ff980c5d82d in ntdll!.chkstk () from 
/cygdrive/c/Windows/SYSTEM32/ntdll.dll
No symbol table info available.
#4  0x00007ff980c0916b in ntdll!RtlRaiseException () from 
/cygdrive/c/Windows/SYSTEM32/ntdll.dll
No symbol table info available.
#5  0x00007ff980c5c9ee in ntdll!KiUserExceptionDispatcher () from 
/cygdrive/c/Windows/SYSTEM32/ntdll.dll
No symbol table info available.
#6  0x00007ff96ba812a9 in memmove () from /cygdrive/c/Windows/cygwin1.dll
No symbol table info available.
#7  0x00000001004099ec in rank_scores_thread (arg=0x6ffecf1c0010) at 
GLZAcompress.c:854
        new_score_rank = 767
        new_score_lmi2 = 333188156
        new_score_pmi2 = 333188149
        rank = 3360
        max_rank = 767
        num_symbols = 7
        new_score_lmi = 332397489
        new_score_pmi = 332397482
        thread_data_ptr = 0x6ffecf1c0010
        max_scores = 3361
        candidates_index = 0xa00034460
        score_index = 3319
        node_score_num_symbols = 14
        num_candidates = 3361
        node_ptrs_num = 49710
        local_write_index = 49711
        rank_scores_buffer = 0x6ffecf1c0020
        candidates = 0x6ffecf2c0020
        score = 50.3955727
#8  0x00007ff96bae2eec in cygwin1!.getreent () from 
/cygdrive/c/Windows/cygwin1.dll
No symbol table info available.
#9  0x00007ff96ba847d3 in cygwin1!.assert () from 
/cygdrive/c/Windows/cygwin1.dll
No symbol table info available.
#10 0x0000000000000000 in ?? ()
No symbol table info available.

candidates_index is 0x10 less than it was in the previously mentioned SIGTRAP.  
Not surprising, I made some minor unrelated code changes.

This time, it should be moving the values in candidates index values 
(uint16_t's) to an address that is 2 bytes larger starting at index 3359 and 
ending at index 767 (vs. starting at 4379 and ending at index 2633 in the 
previously mentioned SIGTRAP).

Unfortunately, I am not seeing any additional information to make debugging the 
SIGTRAP in memmove easier.  Was I supposed to install additional packages to 
debug memmove?  Are there commands in gbd to explore what happened with the 
memmove?  Any further help in debugging this problem would be greatly 
appreciated.  Unless there is more information available through gdb it seems 
the best option may be to make a simplified test program that exhibits the 
problem.

On the bright side, I am gaining confidence that the GLZA program is robust as 
long as it does not use memmove in this location.  It would have taken much 
longer to track this down without the gdb included in the Cygwin package.  I 
would have assumed memmove is a robust command when all the evidence I have 
gathered indicates it is not robust in this use case.  I could just let it go, 
but feel there is a strong possibility that there is a bug in memmove that 
needs further investigation.  If my coding skills were better, I might know how 
to proceed, but right now I'm kind of stuck in that regard.  It seems it would 
be best to investigate the memmove code but I have not been able to locate it 
in the install package or find a way to get gdb to print useful information for 
functions in the library.  Any ideas?

Best Regards,

Kennon



> On 02/25/2026 10:28 AM PST Brian Inglis via Cygwin <[email protected]> wrote:
> 
>  
> Hi Kennon,
> 
> To make debugging easier and more informative with source and symbols install 
> the cygwin-debuginfo package, similarly with any other library dependencies, 
> or 
> other package binaries.
> 
> On 2026-02-25 11:18, KENNON J CONRAD via Cygwin wrote:
> > new_score_rank is a local variable.  GDB only prints local variables.  No 
> > other thread can access this local variable.
> > 
> > Best Regards,
> > 
> > Kennon
> > 
> >> On 02/25/2026 2:32 AM PST Duncan Roe via Cygwin <[email protected]> wrote:
> >>
> >>   
> >> Hi Kennon,
> >>
> >> On Tue, Feb 24, 2026 at 10:38:01AM -0800, cygwin wrote:
> >>> Hello,
> >>>
> >>>     I am having a problem with that is apparently related to memmove and 
> >>> looking for some advice on how to investigate further.  This winter I 
> >>> have been working to simplify GLZA source code and make it more readable. 
> >>>  GLZA is an advanced open source code straight line grammar compressor 
> >>> first released in 2015.  Among these changes was replacing some rather 
> >>> bloated code with memmove and memset in various locations.  The program 
> >>> started crashing occassionally and after extensively reviewing the 
> >>> changes, I was unable to find a cause for these crashes.  So I installed 
> >>> gdb to try to find out what was going on and was apparently able to find 
> >>> the cause of the problem.  As a new gdb user, I am not very comfortable 
> >>> with trusting the results of what gdb showing, but it is pointing 
> >>> directly to one of the code changes I made.  I backed out of this code 
> >>> change and the program has not crashed after 3 days of nearly continuous 
> >>> testing.
> >>>
> >>>     So here is what gdb reports when backtrace is run immediately after 
> >>> reporting a "SIGTRAP":
> >>>
> >>> (gdb) bt full
> >>> #0 0x00007ff9dd8aa98b in KERNELBASE!DebugBreak () from 
> >>> /cygdrive/c/Windows/system32/KERNELBASE.dll
> >>> No symbol table info available.
> >>> #1 0x00007ff9ca3b6417 in cygwin1!.assert () from 
> >>> /cygdrive/c/Windows/cygwin1.dll
> >>> No symbol table info available.
> >>> #2 0x00007ff9ca3cfb18 in secure_getenv () from 
> >>> /cygdrive/c/Windows/cygwin1.dll
> >>> No symbol table info available.
> >>> #3 0x00007ff9e03dd82d in ntdll!.chkstk () from 
> >>> /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> >>> No symbol table info available.
> >>> #4 0x00007ff9e038916b in ntdll!RtlRaiseException () from 
> >>> /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> >>> No symbol table info available.
> >>> #5 0x00007ff9e03dc9ee in ntdll!KiUserExceptionDispatcher () from 
> >>> /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> >>> No symbol table info available.
> >>> #6 0x00007ff9ca3b12a9 in memmove () from /cygdrive/c/Windows/cygwin1.dll
> >>> No symbol table info available.
> >>> #7 0x0000000100409a7c in rank_scores_thread (arg=0x6ffece890010) at 
> >>> GLZAcompress.c:904
> >>> new_score_rank = 2633
> >>> new_score_lmi2 = 183964750
> >>> new_score_pmi2 = 183964725
> >>> rank = 4380
> >>> max_rank = 2633
> >>> num_symbols = 25
> >>> new_score_lmi = 92079851
> >>> new_score_pmi = 92079826
> >>> thread_data_ptr = 0x6ffece890010
> >>> max_scores = 4883
> >>> candidates_index = 0xa00034470
> >>> score_index = 4380
> >>> node_score_num_symbols = 7
> >>> num_candidates = 4381
> >>> node_ptrs_num = 12224
> >>> local_write_index = 12225
> >>> rank_scores_buffer = 0x6ffece890020
> >>> candidates = 0x6ffece990020
> >>> score = 47.6283531
> >>> #8 0x00007ff9ca412eec in cygwin1!.getreent () from 
> >>> /cygdrive/c/Windows/cygwin1.dll
> >>> No symbol table info available.
> >>> #9 0x00007ff9ca3b47d3 in cygwin1!.assert () from 
> >>> /cygdrive/c/Windows/cygwin1.dll
> >>> No symbol table info available.
> >>> #10 0x0000000000000000 in ?? ()
> >>> No symbol table info available.
> >>>
> >>> GLZAcompress.c line 904 is as follows and is in code that runs as a 
> >>> separate thread created in main:
> >>> memmove(&candidates_index[new_score_rank+1], 
> >>> &candidates_index[new_score_rank], 2 * (rank - new_score_rank));
> >>> This does point directly to where a code change was made.
> >>>
> >>> candidates_index is allocated in main and not ever intentionally changed 
> >>> until deallocated at the end of program execution:
> >>> if (0 == (candidates_index = (uint16_t *)malloc(max_scores * 
> >>> sizeof(uint16_t))))
> >>>    fprintf(stderr, "ERROR - memory allocation failed\n");
> >>> This value is passed to the thread in a structure pointed to by the 
> >>> thread arg.  The value 0xa00034470 for candidates_index is similar to 
> >>> what is reported on subsequent runs with added code to print this value 
> >>> so I don't think it's corrupted, but would need to duplicate the crash 
> >>> after checking the initial value to be 100% certain.  With gdb reporting 
> >>> that rank = 4380 and new_score_rank = 2633 at the time of the SIGTRAP, 
> >>> this should be a backward move of 1747 uint16_t values by 2 bytes with a 
> >>> 2 byte difference between the source and destination addresses.
> >>>
> >>> Prior to this code change and for the last 3 days I have been using this 
> >>> code instead and not seen any crashes:
> >>> uint16_t * score_ptr = &candidates_index[new_score_rank];
> >>> uint16_t * candidate_ptr = &candidates_index[rank];
> >>> while (candidate_ptr >= score_ptr + 8) {
> >>>   *candidate_ptr = *(candidate_ptr - 1);
> >>>   *(candidate_ptr - 1) = *(candidate_ptr - 2);
> >>>   *(candidate_ptr - 2) = *(candidate_ptr - 3);
> >>>   *(candidate_ptr - 3) = *(candidate_ptr - 4);
> >>>   *(candidate_ptr - 4) = *(candidate_ptr - 5);
> >>>   *(candidate_ptr - 5) = *(candidate_ptr - 6);
> >>>   *(candidate_ptr - 6) = *(candidate_ptr - 7);
> >>>   *(candidate_ptr - 7) = *(candidate_ptr - 8);
> >>>   candidate_ptr -= 8;
> >>> }
> >>> while (candidate_ptr > score_ptr) {
> >>>   *candidate_ptr = *(candidate_ptr - 1);
> >>>   candidate_ptr--;
> >>> }
> >>> Yes, it's bloated code that should do the same thing as the memmove, but 
> >>> most importantly the code has never caused any problems.  Interestingly, 
> >>> even this code shows memmove in the assembly code (gcc -S), but only for 
> >>> the second while loop.  The looping code for the first while loop looks 
> >>> like this and moves 8 uint16_t's in just 5 instruction so it is perhaps 
> >>> not as inefficient as the source code looks:
> >>> .L25:
> >>> movdqu -16(%rax), %xmm1
> >>> subq $16, %rax
> >>> movups %xmm1, 2(%rax)
> >>> cmpq %rdx, %rax
> >>> jnb .L25
> >>>
> >>> It may or may not matter, but the code this is happening on is very CPU 
> >>> intensive - there can be up to 8 threads running at the same time when 
> >>> this problem occurs.  The problem doesn't occur consistently, it seems to 
> >>> be rather random.  The program runs about 500 iterations of ranking up to 
> >>> the top 30,000 new grammar rule candidates over nearly 4 hours on my test 
> >>> case and has crashed on different iterations each time it has crashed, 
> >>> even though the thread that seems to be crashing should be seeing exactly 
> >>> the same data each time the program is run.  The malloc'ed array address 
> >>> could be changing, I haven't checked that out.
> >>>
> >>> I find it really hard to believe there is a bug in memmove but that seems 
> >>> to be what gdb and my testing are indicating.  So I am looking for advice 
> >>> on how to better understand what is causing the program to crash.  I 
> >>> would like to review the code memset is using, but have not been able to 
> >>> figure out how to track that down.  Any help in understanding what code 
> >>> the complier is using for memmove would be helpful.  Are there other 
> >>> things I could possibly be overlooking?  Are the any other things I 
> >>> should review or report that would be helpful?  I could try to write a 
> >>> simplified test case if that would be useful.
> >>>
> >>> Best Regards,
> >>>
> >>> Kennon Conrad
> >>>
> >>>
> >>
> >>>
> >>> --
> >>> Problem reports:      https://cygwin.com/problems.html
> >>> FAQ:                  https://cygwin.com/faq/
> >>> Documentation:        https://cygwin.com/docs.html
> >>> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
> >>
> >> The memmove() call acceses new_score_rank 3 times while the old code only
> >> accessed it once. Is it possible that another CPU alters new_score_rank 
> >> between
> >> these acesses?
> >>
> >> You could eliminate that possibility by making a local copy of 
> >> new_score_rank
> >> and using that in the memmove() call. Worth a try?
> >>
> >> Cheers ... Duncan.
> >>
> >> -- 
> >> Problem reports:      https://cygwin.com/problems.html
> >> FAQ:                  https://cygwin.com/faq/
> >> Documentation:        https://cygwin.com/docs.html
> >> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
> > 
> 
> 
> -- 
> Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada
> 
> La perfection est atteinte                   Perfection is achieved
> non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
> mais lorsqu'il n'y a plus rien à retrancher  but when there is no more to cut
>                                  -- Antoine de Saint-Exupéry
> 
> -- 
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to