Good evening,

I'm not quite sure if this is a bug in my code or Twill. I've been working
away on TWILL-78, but I'm running into some basic issues with me not
seeming to receive all the log messages from my containers. You can find a
copy of my toy application here:
https://gist.github.com/erickt/7b16d695b64384015b41. I've been testing
twill with 3 containers. Occasionally I get everything I expect, but
sometimes I only seem to get a subset of the log message on the worker
nodes. Here's an example. While in /var/log/hadoop-yarn/container/... on
the nodes have:

node1:
...
Launching main: public static void
org.apache.twill.internal.container.TwillContainerMain.main(java.lang.String[])
throws java.lang.Exception []
2014-05-10 00:20:09,334 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@57] -
entering barrier
2014-05-10 00:20:09,352 - WARN
[ConnectionStateManager-0:o.a.c.f.s.ConnectionStateManager@212] - There are
no ConnectionStateListeners registered.
2014-05-10 00:20:11,187 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@66] - in
barrier
2014-05-10 00:20:11,188 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@76] - woken up
2014-05-10 00:20:11,830 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@83] - out of
barrier
2014-05-10 00:20:11,831 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@93] - done
Main class completed.
Launcher completed
Cleanup directory tmp/twill.launcher-1399681206035-0
----

node2:
...
Launching main: public static void
org.apache.twill.internal.container.TwillContainerMain.main(java.lang.String[])
throws java.lang.Exception []
2014-05-10 00:20:00,133 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@57] -
entering barrier
2014-05-10 00:20:00,158 - WARN
[ConnectionStateManager-0:o.a.c.f.s.ConnectionStateManager@212] - There are
no ConnectionStateListeners registered.
2014-05-10 00:20:02,161 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@66] - in
barrier
2014-05-10 00:20:02,161 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@76] - woken up
2014-05-10 00:20:02,979 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@83] - out of
barrier
2014-05-10 00:20:02,979 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@93] - done
Main class completed.
Launcher completed
Cleanup directory tmp/twill.launcher-1399681196232-0
----

node3:
...
Launching main: public static void
org.apache.twill.internal.container.TwillContainerMain.main(java.lang.String[])
throws java.lang.Exception []
2014-05-10 00:20:01,191 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@57] -
entering barrier
2014-05-10 00:20:01,197 - WARN
[ConnectionStateManager-0:o.a.c.f.s.ConnectionStateManager@212] - There are
no ConnectionStateListeners registered.
2014-05-10 00:20:02,768 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@66] - in
barrier
2014-05-10 00:20:02,769 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@76] - woken up
2014-05-10 00:20:03,587 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@83] - out of
barrier
2014-05-10 00:20:03,588 - ERROR
[ServiceDelegate:o.l.g.t.GraphlabApplication$GraphlabRunnable@93] - done
Main class completed.
Launcher completed
Cleanup directory tmp/twill.launcher-1399681197698-0
----

But the driver script in this case only shows the output from 2 nodes:

----
2014-05-09 17:19:15,137 - WARN  [main:o.a.h.u.NativeCodeLoader@62] - Unable
to load native-hadoop library for your platform... using builtin-java
classes where applicable
2014-05-09 17:19:16,297 - ERROR [main:o.l.g.t.GraphlabApplication@136] -
before getting completion
2014-05-10T00:20:00,133Z ERROR o.l.g.t.GraphlabApplication [node1]
[ServiceDelegate]
GraphlabApplication$GraphlabRunnable:run(GraphlabApplication.java:57) -
entering barrier
2014-05-10T00:20:09,334Z ERROR o.l.g.t.GraphlabApplication [node2]
[ServiceDelegate]
GraphlabApplication$GraphlabRunnable:run(GraphlabApplication.java:57) -
entering barrier
2014-05-10T00:20:09,352Z WARN  o.a.c.f.s.ConnectionStateManager [node2]
[ConnectionStateManager-0]
ConnectionStateManager:processEvents(ConnectionStateManager.java:212) -
There are no ConnectionStateListeners registered.
2014-05-10T00:20:11,187Z ERROR o.l.g.t.GraphlabApplication [node2]
[ServiceDelegate]
GraphlabApplication$GraphlabRunnable:run(GraphlabApplication.java:66) - in
barrier
2014-05-10T00:20:11,188Z ERROR o.l.g.t.GraphlabApplication [node2]
[ServiceDelegate]
GraphlabApplication$GraphlabRunnable:run(GraphlabApplication.java:76) -
woken up
2014-05-10T00:20:11,830Z ERROR o.l.g.t.GraphlabApplication [node2]
[ServiceDelegate]
GraphlabApplication$GraphlabRunnable:run(GraphlabApplication.java:83) - out
of barrier
2014-05-10T00:20:11,831Z ERROR o.l.g.t.GraphlabApplication [node2]
[ServiceDelegate]
GraphlabApplication$GraphlabRunnable:run(GraphlabApplication.java:93) - done
2014-05-10T00:20:00,158Z WARN  o.a.c.f.s.ConnectionStateManager [node1]
[ConnectionStateManager-0]
ConnectionStateManager:processEvents(ConnectionStateManager.java:212) -
There are no ConnectionStateListeners registered.
2014-05-10T00:20:02,161Z ERROR o.l.g.t.GraphlabApplication [node1]
[ServiceDelegate]
GraphlabApplication$GraphlabRunnable:run(GraphlabApplication.java:66) - in
barrier
2014-05-10T00:20:02,161Z ERROR o.l.g.t.GraphlabApplication [node1]
[ServiceDelegate]
GraphlabApplication$GraphlabRunnable:run(GraphlabApplication.java:76) -
woken up
2014-05-10T00:20:02,979Z ERROR o.l.g.t.GraphlabApplication [node1]
[ServiceDelegate]
GraphlabApplication$GraphlabRunnable:run(GraphlabApplication.java:83) - out
of barrier
2014-05-10T00:20:02,979Z ERROR o.l.g.t.GraphlabApplication [node1]
[ServiceDelegate]
GraphlabApplication$GraphlabRunnable:run(GraphlabApplication.java:93) - done
14/05/09 17:20:13 INFO consumer.SimpleConsumer: Reconnect due to socket
error: Connection reset by peer
2014-05-09 17:20:13,310 - ERROR [main:o.l.g.t.GraphlabApplication@144] -
after shutting down
2014-05-09 17:20:13,312 - ERROR [Thread-3:o.l.g.t.GraphlabApplication$1@130]
- shutting down
---

So is this something I'm doing wrong? Or is Twill or Kafka somehow shutting
down before all the messages have been sent?

Thanks for any help,
-Erick

Reply via email to