On Sun, 8 Apr 2018 13:50:14 +0900
Carsten Haitzler (The Rasterman) <ras...@rasterman.com> wrote:
>
> terminate sends a term signal. process may trap this and not respond
> if it so chooses. kill signals can't be trapps so kill will kill.

Sure, the parent entrance process is trapping those signals, but not the
client. The problem is a break down between server and client. It seems
the ecore_exe stuff is not helping in such case.

The parent entrance process traps the kill signal. Then tells the client
to stop, and that stops the parent server process. The server process
stopping is dependent on being able to control the client process.

I can see the signals trapped and call to ecore_exe_terminate. But the
client isn't terminated properly or something is happening that should
not normally. Seems to be something funky in Travis Ubuntu env. I
cannot replicate the same issues on live systems or via Xephyr

> to know the result the ECORE_EXE_DEL event will happen - listen for
> it with the exe that exited and why (exit code). if this event
> doesn't come in - the app hasn't exited yet.

Something is going wrong. The client does exit, but the
handler/callback is never fired. I am doing a ps after and I can see no
entrance_client process.
https://travis-ci.org/Obsidian-StudiosInc/entrance/jobs/363681776#L1261

Yet I never see the log from callback.

src/daemon/entrance.c:302 _entrance_client_del() client have terminated

ecore_event_handler_add(ECORE_EXE_EVENT_DEL, _entrance_client_del,
NULL);

Those [Xorg] <defunct> and [bash] <defunct> process maybe sign of the
problem. The bash one is from entrance_client. But the hang maybe
related to Xorg itself.

>  if the process has  exited already these will have no effect. it
> could be the process is stuck in a kernel syscall (does happen) and
> the kernel is refusing to end the process.

Seems like something is stuck or going wrong. But with entrance not
entrance_client. But what ever is going wrong, seems to effect their
communication.

I had some trials before where I killed the client. It got stuck trying
to kill the parent or something.

This was the furthest I ever got
https://travis-ci.org/Obsidian-StudiosInc/entrance/jobs/363371263#L1242
But that got stuck and never logged "login shutdown".
https://github.com/Obsidian-StudiosInc/entrance/blob/master/src/bin/entrance_client.c#L87
The message shown comes from entrance_gui_shutdown();
https://github.com/Obsidian-StudiosInc/entrance/blob/master/src/bin/entrance_gui.c#L234
I cannot see anything there that would cause that to hang, but it never
makes it to "login shutdown"

Another attempt later, never made it past _entrance_server_del
https://travis-ci.org/Obsidian-StudiosInc/entrance/jobs/363663070#L1359

I thought that one was due to failing to write profile files. But with
that resolved, seems to just hang...
https://travis-ci.org/Obsidian-StudiosInc/entrance/jobs/363681776#L1307

>  kill() (the syscall) will not tell you this. the
> only things it will tell you is if you don't have permission to kill
> the process (another uid and you are not root for example), or the
> pid does not exist.

I was more thinking if there was an error killing the process. Then I
could take further action. Issue another kill, maybe change signals
from 9 to 15. Or some action to kill the stuck process.

>  i assume you know the pid exists and it's there
> as you say it's not going away...

That is what I was thinking about doing after calling
ecore_exe_terminate. See if the PID existed and if not, take action.

>  so either it's a different uid  (setuid?)

That is taking place in the client, but ps is showing all processes
running as root. That suid does not cause problem on normal systems.
The only time I cannot shutdown entrance normally is when the desktop
session is running, active E desktop session.
https://github.com/Obsidian-StudiosInc/entrance/issues/5

Not sure if that is related to the problem I am having or not.

> , or it's ignoring term  signals but kill should work (unless
> permissions fail), and other than that ... it may be kernel holding
> on in a syscall. i have seen this happen often enough over the years
> and end up with an unkillable process because of it.

I do not believe its being ignored, but something funky is happening. I
am not sure if it is related or not. But to get things to start, I have
to send a kill signal that I believe the X server should send.

kill -SIGUSR1
https://github.com/Obsidian-StudiosInc/entrance/blob/master/tests/run-tests.sh#L25

I could not figure out why that was not called on its own. It could be
I am rushing things. I increased the timeout between start and kill to
30, but not really seeing any difference. Not sure if I need a delay
from start of X to stopping/killing entrance. For normal hardware I can
see it taking time to start. For virtual with dummy, not sure there
should be any delay.

There could be a some general issue with signals or otherwise not sure.
Lots of funky issues in Travis environment with entrance. But I see
that as a good thing, as it is helping to improve error handling. It is
surely presenting some odd scenarios.

-- 
William L. Thomson Jr.

Attachment: pgpAmlK_TdmMD.pgp
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to