Hi,

sorry for messing up years.
lslocks only showed makes locking /dev/null, but it appears to be that
the culprit is a running dockerd daemon.
I dont understand why, but with the service disabled a blocked make
will suddenly continue.

to install the service:
echo > /etc/apt/sources.list.d/docker.list 'deb [arch=amd64]
https://apt.dockerproject.org/repo/ debian-stretch main'
apt-get update; apt-get install docker-engine

For completeness, the lslocks output:
$ lslocks
COMMAND           PID   TYPE   SIZE MODE  M      START        END PATH
zeitgeist-fts    1685  POSIX  15.2M READ  0 1073741826 1073742335
/home/noppl/.local/share/zeitgeist/activity.sqlite
zeitgeist-fts    1685  POSIX    32K READ  0        128        128
/home/noppl/.local/share/zeitgeist/activity.sqlite-shm
chromium         1872  POSIX     0B WRITE 0          0          0
/home/noppl/.config/chromium/Default/data_reduction_proxy_leveldb/LOCK
chromium         1872  POSIX  16.7M WRITE 0 1073741824 1073742335
/home/noppl/.config/chromium/Default/History
atd               742  POSIX     4B WRITE 0          0          0 /run/atd.pid
tracker-store    1609  POSIX 256.5M READ  0 1073741826 1073742335
/home/noppl/.cache/tracker/meta.db
tracker-store    1609  POSIX    32K READ  0        128        128
/home/noppl/.cache/tracker/meta.db-shm
zeitgeist-datah  1593  POSIX  15.2M READ  0 1073741826 1073742335
/home/noppl/.local/share/zeitgeist/activity.sqlite
zeitgeist-datah  1593  POSIX    32K READ  0        128        128
/home/noppl/.local/share/zeitgeist/activity.sqlite-shm
chromium         1872  POSIX     0B WRITE 0          0          0
/home/noppl/.config/chromium/Default/Service Worker/Database/LOCK
chromium         1872  POSIX     0B WRITE 0          0          0
/home/noppl/.config/chromium/Default/Session Storage/LOCK
libvirtd          955  POSIX     3B WRITE 0          0          0
/run/libvirtd.pid
chromium         1872  POSIX     0B WRITE 0          0          0
/home/noppl/.config/chromium/Default/GCM Store/LOCK
zeitgeist-fts    1685 OFDLCK     0B WRITE 0          0          0
/home/noppl/.local/share/zeitgeist/fts.index/flintlock
chromium         1872  POSIX   352K WRITE 0 1073741824 1073742335
/home/noppl/.config/chromium/Default/Web Data
chromium         1872  POSIX   3.6M WRITE 0 1073741824 1073742335
/home/noppl/.config/chromium/Default/Sync Data/SyncData.sqlite3
cron              728  FLOCK     4B WRITE 0          0          0 /run/crond.pid
chromium         1872  POSIX   124K WRITE 0 1073741824 1073742335
/home/noppl/.config/chromium/Default/Login Data
chromium         1872  POSIX  13.6M READ  0 1073741826 1073742335
/home/noppl/.config/chromium/Default/Favicons
chromium         1872  POSIX     0B WRITE 0          0          0
/home/noppl/.config/chromium/Default/Extension State/LOCK
chromium         1872  POSIX     0B WRITE 0          0          0
/home/noppl/.config/chromium/Default/File System/041/t/Paths/LOCK
rpcbind           689  FLOCK     0B WRITE 0          0          0
/run/rpcbind.lock
zeitgeist-daemo  1651  POSIX  15.2M READ  0 1073741826 1073742335
/home/noppl/.local/share/zeitgeist/activity.sqlite
zeitgeist-daemo  1651  POSIX    32K READ  0        128        128
/home/noppl/.local/share/zeitgeist/activity.sqlite-shm
chromium         1872  POSIX     0B WRITE 0          0          0
/home/noppl/.config/chromium/Default/File System/Origins/LOCK
chromium         1872  POSIX   736K WRITE 0 1073741824 1073742335
/home/noppl/.config/chromium/Default/Shortcuts
dockerd          3732 OFDLCK        READ  0          0          0 /dev...
dockerd          3732  FLOCK   128K WRITE 0          0          0
/var/lib/docker/volumes/metadata.db


2017-02-18 1:34 GMT+01:00 James Cowgill <jcowg...@debian.org>:
> Hi,
>
> On 17/02/17 18:08, Norbert Lange wrote:
>> Hello,
>>
>> Tried reproducing it at work (where it first happened on a build server).
>> On my PC at home with 4 cores / 12 thread the bug reproduces always
>> On a 6 core / 12 threads Xeon Server  the bug reproduces always
>> On my work PC with 4 cores / 4 threads running in a VMware Instance it
>> doesnt reproduce.
>> All running Debian Stretch with current updates.
>>
>> Maybe you want to add infos about your system?
>> From the sample of 3: Hyperthreading or >= 8 threads or runnin on bare
>> metal instead of in a VM could provoke the bug.
>
> Originally the system I tried it on has 8 cores (can't remember number
> of threads), but I tried it on machines with 2 cores and one with 16 and
> it worked on all of them. I don't think the number of cores is relevant
> here.
>
>> Further make 4.1 was uploaded to Debian Stretch on 16h january, the
>> issue appeared on 19th january on the server.
>> So disregard what I said about this not being an upstream issue - its
>> actually quite possible.
>
> Have you muddled years up here? 4.1 was uploaded on 16th Jan *2016*.
>
>> Heres a dump via attached gdb (step wont do anything so it seems that
>> the thread is blocked):
>>
>> (gdb) thread apply all bt
>>
>> Thread 1 (process 12177):
>> #0  0x00007f476c156962 in do_fcntl (fd=1, cmd=7, arg=0x5595eae95ea0)
>> at ../sysdeps/unix/sysv/linux/fcntl.c:31
>
> This is fcntl(stdout = /dev/null, F_SETLKW, <struct flock>)
>
> It seems that "make -O" attempts to lock stdout before writing to it so
> that multiple make processes can cooperate on who gets to write any
> output. If it's hanging here, then someone must already be holding the lock.
>
> Please can you give the output of "lslocks" on the machines that fail.
> There might be an entry for /dev/null which will point at the culprit.
> Failing that, an "strace -f" would be useful so we can see all the calls
> made to fcntl.
>
>> I`ll have to compile make with debuginfo if you need more (gonna take
>> a few days)
>
> I don't need any debug information, but you may be interested in this:
> https://wiki.debian.org/AutomaticDebugPackages
>
> So if you add this apt source:
> deb http://deb.debian.org/debian-debug/ unstable-debug main
>
> You can then install make-dbgsym to get the debug symbols for make
> without recompiling anything.
>
> Thanks,
> James
>
>> 2017-02-17 15:24 GMT+01:00 James Cowgill <jcowg...@debian.org>:
>>> On 16/02/17 21:52, Norbert Lange wrote:
>>>> Package: make
>>>> Version: 4.1-9
>>>> Severity: important
>>>>
>>>> Dear Maintainer,
>>>>
>>>> running the attached Makefile will hang the process,
>>>> if multiple jobs are used then the process wont respond to a
>>>> TERM and has to be killed.
>>>>
>>>> The very same issue is observed with make-guile.
>>>>
>>>> I believe this to not be an upstream bug, since I observed this
>>>> only a couple weeks ago after an upgrade.
>>>> Unfortunatly I can`t pinpoint a date or version.
>>>
>>> I cannot reproduce this bug.
>>>
>>> Also, make has not been updated in testing for almost a year so if it
>>> only started happening recently, something else probably caused it.
>>>
>>> Running 'make -d -O' (although this may be difficult if the bug requires
>>> redirection to /dev/null) or the output or running make inside gdb and
>>> finding where it hangs might help in diagnosing this.
>>>
>>> Thanks,
>>> James
>

Reply via email to