Hi Brian,

   Thanks!  Should the cygwin-debuginfo package be installed on the computer 
that is compiling the code or the computer that is running the code.  I 
currently only have it installed on the computer that is running the code, not 
the one that is compiling the code.

Best Regards,

Kennon

> On 02/25/2026 10:28 AM PST Brian Inglis via Cygwin <[email protected]> wrote:
> 
>  
> Hi Kennon,
> 
> To make debugging easier and more informative with source and symbols install 
> the cygwin-debuginfo package, similarly with any other library dependencies, 
> or 
> other package binaries.
> 
> On 2026-02-25 11:18, KENNON J CONRAD via Cygwin wrote:
> > new_score_rank is a local variable.  GDB only prints local variables.  No 
> > other thread can access this local variable.
> > 
> > Best Regards,
> > 
> > Kennon
> > 
> >> On 02/25/2026 2:32 AM PST Duncan Roe via Cygwin <[email protected]> wrote:
> >>
> >>   
> >> Hi Kennon,
> >>
> >> On Tue, Feb 24, 2026 at 10:38:01AM -0800, cygwin wrote:
> >>> Hello,
> >>>
> >>>     I am having a problem with that is apparently related to memmove and 
> >>> looking for some advice on how to investigate further.  This winter I 
> >>> have been working to simplify GLZA source code and make it more readable. 
> >>>  GLZA is an advanced open source code straight line grammar compressor 
> >>> first released in 2015.  Among these changes was replacing some rather 
> >>> bloated code with memmove and memset in various locations.  The program 
> >>> started crashing occassionally and after extensively reviewing the 
> >>> changes, I was unable to find a cause for these crashes.  So I installed 
> >>> gdb to try to find out what was going on and was apparently able to find 
> >>> the cause of the problem.  As a new gdb user, I am not very comfortable 
> >>> with trusting the results of what gdb showing, but it is pointing 
> >>> directly to one of the code changes I made.  I backed out of this code 
> >>> change and the program has not crashed after 3 days of nearly continuous 
> >>> testing.
> >>>
> >>>     So here is what gdb reports when backtrace is run immediately after 
> >>> reporting a "SIGTRAP":
> >>>
> >>> (gdb) bt full
> >>> #0 0x00007ff9dd8aa98b in KERNELBASE!DebugBreak () from 
> >>> /cygdrive/c/Windows/system32/KERNELBASE.dll
> >>> No symbol table info available.
> >>> #1 0x00007ff9ca3b6417 in cygwin1!.assert () from 
> >>> /cygdrive/c/Windows/cygwin1.dll
> >>> No symbol table info available.
> >>> #2 0x00007ff9ca3cfb18 in secure_getenv () from 
> >>> /cygdrive/c/Windows/cygwin1.dll
> >>> No symbol table info available.
> >>> #3 0x00007ff9e03dd82d in ntdll!.chkstk () from 
> >>> /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> >>> No symbol table info available.
> >>> #4 0x00007ff9e038916b in ntdll!RtlRaiseException () from 
> >>> /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> >>> No symbol table info available.
> >>> #5 0x00007ff9e03dc9ee in ntdll!KiUserExceptionDispatcher () from 
> >>> /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> >>> No symbol table info available.
> >>> #6 0x00007ff9ca3b12a9 in memmove () from /cygdrive/c/Windows/cygwin1.dll
> >>> No symbol table info available.
> >>> #7 0x0000000100409a7c in rank_scores_thread (arg=0x6ffece890010) at 
> >>> GLZAcompress.c:904
> >>> new_score_rank = 2633
> >>> new_score_lmi2 = 183964750
> >>> new_score_pmi2 = 183964725
> >>> rank = 4380
> >>> max_rank = 2633
> >>> num_symbols = 25
> >>> new_score_lmi = 92079851
> >>> new_score_pmi = 92079826
> >>> thread_data_ptr = 0x6ffece890010
> >>> max_scores = 4883
> >>> candidates_index = 0xa00034470
> >>> score_index = 4380
> >>> node_score_num_symbols = 7
> >>> num_candidates = 4381
> >>> node_ptrs_num = 12224
> >>> local_write_index = 12225
> >>> rank_scores_buffer = 0x6ffece890020
> >>> candidates = 0x6ffece990020
> >>> score = 47.6283531
> >>> #8 0x00007ff9ca412eec in cygwin1!.getreent () from 
> >>> /cygdrive/c/Windows/cygwin1.dll
> >>> No symbol table info available.
> >>> #9 0x00007ff9ca3b47d3 in cygwin1!.assert () from 
> >>> /cygdrive/c/Windows/cygwin1.dll
> >>> No symbol table info available.
> >>> #10 0x0000000000000000 in ?? ()
> >>> No symbol table info available.
> >>>
> >>> GLZAcompress.c line 904 is as follows and is in code that runs as a 
> >>> separate thread created in main:
> >>> memmove(&candidates_index[new_score_rank+1], 
> >>> &candidates_index[new_score_rank], 2 * (rank - new_score_rank));
> >>> This does point directly to where a code change was made.
> >>>
> >>> candidates_index is allocated in main and not ever intentionally changed 
> >>> until deallocated at the end of program execution:
> >>> if (0 == (candidates_index = (uint16_t *)malloc(max_scores * 
> >>> sizeof(uint16_t))))
> >>>    fprintf(stderr, "ERROR - memory allocation failed\n");
> >>> This value is passed to the thread in a structure pointed to by the 
> >>> thread arg.  The value 0xa00034470 for candidates_index is similar to 
> >>> what is reported on subsequent runs with added code to print this value 
> >>> so I don't think it's corrupted, but would need to duplicate the crash 
> >>> after checking the initial value to be 100% certain.  With gdb reporting 
> >>> that rank = 4380 and new_score_rank = 2633 at the time of the SIGTRAP, 
> >>> this should be a backward move of 1747 uint16_t values by 2 bytes with a 
> >>> 2 byte difference between the source and destination addresses.
> >>>
> >>> Prior to this code change and for the last 3 days I have been using this 
> >>> code instead and not seen any crashes:
> >>> uint16_t * score_ptr = &candidates_index[new_score_rank];
> >>> uint16_t * candidate_ptr = &candidates_index[rank];
> >>> while (candidate_ptr >= score_ptr + 8) {
> >>>   *candidate_ptr = *(candidate_ptr - 1);
> >>>   *(candidate_ptr - 1) = *(candidate_ptr - 2);
> >>>   *(candidate_ptr - 2) = *(candidate_ptr - 3);
> >>>   *(candidate_ptr - 3) = *(candidate_ptr - 4);
> >>>   *(candidate_ptr - 4) = *(candidate_ptr - 5);
> >>>   *(candidate_ptr - 5) = *(candidate_ptr - 6);
> >>>   *(candidate_ptr - 6) = *(candidate_ptr - 7);
> >>>   *(candidate_ptr - 7) = *(candidate_ptr - 8);
> >>>   candidate_ptr -= 8;
> >>> }
> >>> while (candidate_ptr > score_ptr) {
> >>>   *candidate_ptr = *(candidate_ptr - 1);
> >>>   candidate_ptr--;
> >>> }
> >>> Yes, it's bloated code that should do the same thing as the memmove, but 
> >>> most importantly the code has never caused any problems.  Interestingly, 
> >>> even this code shows memmove in the assembly code (gcc -S), but only for 
> >>> the second while loop.  The looping code for the first while loop looks 
> >>> like this and moves 8 uint16_t's in just 5 instruction so it is perhaps 
> >>> not as inefficient as the source code looks:
> >>> .L25:
> >>> movdqu -16(%rax), %xmm1
> >>> subq $16, %rax
> >>> movups %xmm1, 2(%rax)
> >>> cmpq %rdx, %rax
> >>> jnb .L25
> >>>
> >>> It may or may not matter, but the code this is happening on is very CPU 
> >>> intensive - there can be up to 8 threads running at the same time when 
> >>> this problem occurs.  The problem doesn't occur consistently, it seems to 
> >>> be rather random.  The program runs about 500 iterations of ranking up to 
> >>> the top 30,000 new grammar rule candidates over nearly 4 hours on my test 
> >>> case and has crashed on different iterations each time it has crashed, 
> >>> even though the thread that seems to be crashing should be seeing exactly 
> >>> the same data each time the program is run.  The malloc'ed array address 
> >>> could be changing, I haven't checked that out.
> >>>
> >>> I find it really hard to believe there is a bug in memmove but that seems 
> >>> to be what gdb and my testing are indicating.  So I am looking for advice 
> >>> on how to better understand what is causing the program to crash.  I 
> >>> would like to review the code memset is using, but have not been able to 
> >>> figure out how to track that down.  Any help in understanding what code 
> >>> the complier is using for memmove would be helpful.  Are there other 
> >>> things I could possibly be overlooking?  Are the any other things I 
> >>> should review or report that would be helpful?  I could try to write a 
> >>> simplified test case if that would be useful.
> >>>
> >>> Best Regards,
> >>>
> >>> Kennon Conrad
> >>>
> >>>
> >>
> >>>
> >>> --
> >>> Problem reports:      https://cygwin.com/problems.html
> >>> FAQ:                  https://cygwin.com/faq/
> >>> Documentation:        https://cygwin.com/docs.html
> >>> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
> >>
> >> The memmove() call acceses new_score_rank 3 times while the old code only
> >> accessed it once. Is it possible that another CPU alters new_score_rank 
> >> between
> >> these acesses?
> >>
> >> You could eliminate that possibility by making a local copy of 
> >> new_score_rank
> >> and using that in the memmove() call. Worth a try?
> >>
> >> Cheers ... Duncan.
> >>
> >> -- 
> >> Problem reports:      https://cygwin.com/problems.html
> >> FAQ:                  https://cygwin.com/faq/
> >> Documentation:        https://cygwin.com/docs.html
> >> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
> > 
> 
> 
> -- 
> Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada
> 
> La perfection est atteinte                   Perfection is achieved
> non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
> mais lorsqu'il n'y a plus rien à retrancher  but when there is no more to cut
>                                  -- Antoine de Saint-Exupéry
> 
> -- 
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to