I bumped into the following thread about dumping stack traces with Gradle [1] and thought that may be worth sharing in case someone decides to implement something along these lines for Calcite.
Best, Stamatis [1] https://discuss.gradle.org/t/dump-stack-trace-for-tests/33524 On Mon, Dec 13, 2021 at 6:26 PM Jacques Nadeau <[email protected]> wrote: > I wonder if we can create a simple shell script that runs a jstack once an > hour (starting after one hour) and then run it using > https://github.com/psxpaul/gradle-execfork-plugin? Since none of our jobs > run an hour, most of the time it wouldn't do anything. In the cases where > the job hung, we'd hopefully get a jstack. > > > On Mon, Dec 13, 2021 at 12:17 AM Stamatis Zampetakis <[email protected]> > wrote: > > > If there is a systematic way to do it I would be interested to know. > > > > In the past, when I encountered similar hangs in CI what I ended-up doing > > is adding debugging commits in the PR with a thread printing stack traces > > of other threads at some intervals. > > > > Best, > > Stamatis > > > > On Sun, Dec 12, 2021 at 7:00 PM Jacques Nadeau <[email protected]> > wrote: > > > > > It could be infra but I'm wondering if it is some kind of concurrency > > bug. > > > > > > Anyone know if there is a straightforward way to add a secondary > process > > in > > > a github workflow that takes a jstack after an hour or something (if > the > > > tests run that long). Trying to jump on an instance when this happens > and > > > do this manually sounds like an effort in frustration. > > > > > > I guess another option would be to modify the druid job to provide info > > on > > > tests that are running so that we can see if it always locks on the > same > > > test. > > > > > > On Sat, Dec 11, 2021 at 11:39 PM Alessandro Solimando < > > > [email protected]> wrote: > > > > > > > I started noticing that intermittently around a month ago, I had a > > quick > > > > look back then but I could not pinpoint the root cause. > > > > > > > > I don't think it is expected, and I guess it comes from test infra > > setup > > > > rather than the Calcite code itself. > > > > > > > > Il Dom 12 Dic 2021, 05:43 Jacques Nadeau <[email protected]> ha > > > scritto: > > > > > > > > > I see a couple of recent builds with Druid tests hanging. Is that a > > > > normal > > > > > thing or something that has started recently. > > > > > > > > > > Examples: > > > > > > > > > https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true > > > > > > > > > > > > > > > https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true > > > > > > > > > > > > > > >
