Hi,

Any feedback on this please, so we can decide on how to proceed on our side?

Thanks,
Nahor

On Wed, Oct 22, 2025 at 10:21 AM Nahor <[email protected]> wrote:
>
> Small simplification and generalization of the issue:
> The location of the directory and the file in relation to each other
> is irrelevant. I can still reproduce the issue with the directory
> being on one drive and the file on another. The flock error ("Bad
> addr") didn't happen in some variations (e.g. directory on C drive and
> file on D drive), but given how rare the error is to begin with, I
> assume it's only a timing issue (my D drive is older than my C for
> instance)
>
> On Tue, Oct 21, 2025 at 5:41 PM Nahor <[email protected]> wrote:
> >
> > Hi,
> >
> > There is a test in the Fish shell (tests::history::test_history_races)
> > that systematically fails when I run it. The test simulates multiple
> > processes/threads trying to write to the shell history file at the
> > same time.
> > In my case, the test freezes/deadlocks with errors like "Bad Addr" and
> > "Is Directory".
> > When I add a sleep, the freeze/deadlocks disappear but the test
> > eventually fails because the fake history is not the right size.
> > See https://github.com/fish-shell/fish-shell/issues/11933 for more details.
> >
> > I wrote a test case in pure C (attached) that also triggers the issue
> > although it's not as systematic (30-50%).
> > To compile: gcc main.c -o test.exe
> > To run: ./test.exe
> >
> >
> > Most failures look like this:
> >     ```
> >     $ ./test.exe
> >     tmp_dir: /tmp/flockc2Hz4c
> >     open file error: 21 - Is a directory
> >             /tmp/flockc2Hz4c/append_file
> >     assertion "file_fd >= 0" failed: file "main.c", line 49, function:
> > thread_func
> >     Aborted
> >     ```
> > Occasionally (maybe 10%), it looks like that:
> >     ```
> >     $ ./test.exe
> >     tmp_dir: /tmp/flock5Oly9J
> >     lock error: 14 - Bad address
> >     assertion "lock_res == 0" failed: file "main.c", line 38,
> > function: thread_func
> >     Aborted
> >     ```
> >
> > I believe the freeze/deadlock in the Fish test is because, unlike my
> > test, they don't assert/crash, and the next time they access the
> > history file, there is a bunch of deadlock in cygwin internals.
> > If that helps, this is a partial capture of the stack traces at one such 
> > time:
> >     ```
> >     Thread 9
> >     #2  0x00000001800d487f in muto::acquire (this=0x1802c24c0
> > <lock_process::locker>, ms=ms@entry=4294967295) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/sync.cc:84
> >     #3  0x00000001800dd6e0 in dtable::lock (this=<optimized out>) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/local_includes/dtable.h:77
> >     #4  cygheap_fdnew::cygheap_fdnew (this=<synthetic pointer>,
> > seed_fd=-1, lockit=true) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/local_includes/cygheap.h:593
> >     #5  open (unix_path=0xa0002b3b0
> > "[...]/fish-shell/target/fish-test-home", flags=262144) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/syscalls.cc:1576
> >
> >     Thread 10
> >     #2  0x00000001800d487f in muto::acquire (this=0x1802c24c0
> > <lock_process::locker>, ms=ms@entry=4294967295) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/sync.cc:84
> >     #3  0x00000001800dd6e0 in dtable::lock (this=<optimized out>) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/local_includes/dtable.h:77
> >     #4  cygheap_fdnew::cygheap_fdnew (this=<synthetic pointer>,
> > seed_fd=-1, lockit=true) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/local_includes/cygheap.h:593
> >     #5  open (unix_path=0xa0002bfe0
> > "[...]/fish-shell/target/fish-test-home/race_test_history.FwyAgK",
> > flags=264706) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/syscalls.cc:1576
> >
> >     Thread 11
> >     #2  0x00000001800670bb in inode_t::LOCK (this=0x80000ba20) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/flock.cc:314
> >     #3  inode_t::get (dev=1881899537, ino=ino@entry=10977524092162599,
> > create_if_missing=create_if_missing@entry=false, lock=lock@entry=true)
> > at /d/S/B/src/msys2-runtime/winsup/cygwin/flock.cc:504
> >     #4  0x0000000180068eb1 in fhandler_base::del_my_locks
> > (this=0x80000b810, from=on_close) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/flock.cc:402
> >     #5  0x000000018010d5bf in fhandler_base::close_with_arch
> > (this=0x80000b810, flag=flag@entry=-1) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/fhandler/base.cc:1306
> >     #6  0x00000001800de36b in __close (fd=5, flag=-1) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/syscalls.cc:1710
> >     #7  close (fd=5) at 
> > /d/S/B/src/msys2-runtime/winsup/cygwin/syscalls.cc:1722
> >
> >     Thread 12
> >     #2  0x00000001800d487f in muto::acquire (this=0x1802c24c0
> > <lock_process::locker>, ms=ms@entry=4294967295) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/sync.cc:84
> >     #3  0x00000001800dd6e0 in dtable::lock (this=<optimized out>) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/local_includes/dtable.h:77
> >     #4  cygheap_fdnew::cygheap_fdnew (this=<synthetic pointer>,
> > seed_fd=-1, lockit=true) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/local_includes/cygheap.h:593
> >     #5  open (unix_path=0x7ff10b488
> > "[...]/fish-shell/target/fish-test-home/race_test_history.pZO5DS",
> > flags=263169) at
> > /d/S/B/src/msys2-runtime/winsup/cygwin/syscalls.cc:1576
> >     ```
> > The freeze/deadlock can be reproduced in my C code by calling
> > "continue" inside the "if (lock_res != 0) {" instead of triggering the
> > assert just after.
> >
> >
> > I haven't been able to reproduce the missing data in the history file
> > so it's unknown if it's an issue in Fish or flock not locking properly
> > at times. So far the test passes on Linux and MacOS.
> >
> >
> > Thanks,
> > Nahor

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to