Hi all -

Thanks in advance for your help; I have been totally stuck on this for a
couple of days.

I have a small YARN cluster with one ResourceManager and one NodeManager
as well as one Zookeeper node and one Kafka node - trying to keep the
number of moving parts to a minimum. I¹ve been following the guide to
running Samza on YARN
(https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html),
 and I get to the end of the tutorial with a Running job in the YARN web
UI, as expected. However, the job doesn¹t actually appear to do anything -
messages are not produced to the ³wikipedia-raw² topic (nor is the topic
created), and no data is logged at all.

To that point, I am having a ton of trouble with Samza¹s logging - in
samza.log.dir on the ResourceManager node, there¹s only gc.log.0.current,
and in the YARN log directory I have only the resourcemanager log which of
course contains no application information. On the NodeManager side,
samza.log.dir contains application-manager.log, which ends at "[INFO]
Requesting 1 container(s) with 1700mb of memory² right after the job
enters the Running state, it¹s own copy of gc.log.0.current, and stderr
and stdout which contain no useful information and also don¹t grow after
the first second of the job running. In YARN¹s logs, there¹s only the node
manager log, which has no errors or warnings and just logs the startup of
the container and then its memory usage from then on, which seems fine:

2015-03-31 20:17:34,635 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
Memory usage of ProcessTree 25767 for container-id
container_1427823389325_0011_01_000001: 104.9 MB of 1 GB physical memory
used; 2.4 GB of 3.1 GB virtual memory used


What am I missing here? WikipediaFeed.java contains a whole bunch of
logging statements, but nothing ever hits any file I can find. Even if you
can¹t help with the problem I¹m having with hello-samza, I would greatly
appreciate any advice on how I can get useful logs from Samza jobs.

I¹ve checked that I can ping the Wikipedia IRC URL and consume
from/produce to the Kafka cluster with the console shell scripts from both
the ResourceManager and NodeManager nodes, and other applications can work
with my Kafka and Zookeeper with no issues. From the application-master
log on the worker node, all I can see is that Samza configures the
Wikipedia IRC system, starts the Webapp, and requests a container. It
enters the Running state with YARN, after which point nothing happens at
all. There¹s no activity at all in the Kafka or Zookeeper logs.

And that¹s it; the job will run for hours if I let it but at no point is
anything produced to Kafka or logged at all. I wrote a simpler task that
just accepts a json message from a topic on Kafka, adds a timestamp, and
produces to another topic, but almost nothing is different. From
application-master log:

2015-03-31 20:07:05 ClientUtils$ [INFO] Fetching metadata from broker
id:0,host:172.31.2.19,port:9092 with correlation id 0 for 1 topic(s)
Set(test)
2015-03-31 20:07:05 SyncProducer [INFO] Connected to 172.31.2.19:9092 for
producing
2015-03-31 20:07:05 SyncProducer [INFO] Disconnecting from 172.31.2.19:9092
2015-03-31 20:07:06 KafkaSystemAdmin$ [INFO] Got metadata: Map(test ->
SystemStreamMetadata [streamName=test, partitionMetadata={Partition
[partition=0]=SystemStreamPartitionMetadata [oldestOffset=0,
newestOffset=4, upcomingOffset=5], Partition
[partition=1]=SystemStreamPartitionMetadata [oldestOffset=null,
newestOffset=null, upcomingOffset=0]}])


which all looks correct. Then it connects to ResourceManager, starts the
Webapp, Requests a container and starts running. All I see in Kafka¹s log
is

[2015-03-31 20:07:05,999] INFO Closing socket connection to /172.31.1.229.
(kafka.network.Processor)
[2015-03-31 20:07:06,090] INFO Closing socket connection to /172.31.1.229.
(kafka.network.Processor)


and Zookeeper has nothing to say at all. As before, no new topic is
created.

So a huge part of this question is just, what am I missing about logging?
Where are the actual job/task-level logs? Aside from that, I just have no
explanation for why nothing is happening in either of these simple tasks.
I would really appreciate any insight anyone can offerŠ

Oh, one more thing - there was an error message in Zookeeper after
submitting my simple StreamTask that I haven¹t been able to reproduce:

2015-03-31 19:48:28,145 [myid:] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
Accepted socket connection from /172.31.2.19:41801
2015-03-31 19:48:28,147 [myid:] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] -
Connection request from old client /172.31.2.19:41801; will be dropped if
server is in r-o mode
2015-03-31 19:48:28,148 [myid:] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client
attempting to establish new session at /172.31.2.19:41801
2015-03-31 19:48:28,149 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617]
- Established session 0x14c70bd0c3e0006 with negotiated timeout 30000 for
client /172.31.2.19:41801
2015-03-31 19:48:28,202 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@494] - Processed session termination for
sessionid: 0x14c70bd0c3e0006
2015-03-31 19:48:28,206 [myid:] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed
socket connection for client /172.31.2.19:41801 which had sessionid
0x14c70bd0c3e0006


172.31.2.19 is the Kafka broker. The job continued unphased; Samza didn¹t
log anything about this socket being closed or any kind of error. Not sure
if that¹s related.


Again, thanks a ton for reading and whatever help you can offer.

Andrew Sannier
Software Engineer, Big Data
C: 480-284-1048
www.helixeducation.com

Reply via email to