Chris -

Wulp, now I feel like a moron, but at least things are running now! Thanks
a lot for helping me diagnose that. On to the next problem...

Andrew Sannier




On 3/31/15, 4:21 PM, "Chris Riccomini" <criccom...@apache.org> wrote:

>Hey Andrew,
>
>It looks like your attachment was stripped by Apache's mailing server.
>Looking at the info you pasted, I can tell you that YARN is most likely
>unable to provision your containers due to space constraint. Here's the
>issue:
>
>Memory Used: 1 GB
>Memory Total: 1.76 GB
>
>The YARN AM and YARN container both take 1G. Your AM is requesting a 1G
>container, which YARN then queues up, and waits for 1G of space to become
>available. Because you only have 760MB left on the node, this will never
>happen. The AM (and YARN) will just sit and wait for more resources.
>
>To test this theory, try setting:
>
>yarn.am.container.memory.mb=512
>yarn.container.memory.mb=512
>
>The first config sets the AM container's memory to 512MB. The second
>config
>sets the SamzaContainer's container to 512MB. Both of these should fit on
>a
>1.76G node.
>
>Thanks!
>Chris
>
>On Tue, Mar 31, 2015 at 2:59 PM, Andrew Sannier
><asann...@helixeducation.com
>> wrote:
>
>> Thanks so much for getting back to me, Chris.
>>
>> I’ve attached the AM log from my most recent attempt to run the
>> hello-samza wikipedia-feed task. I’ve been using pretty small nodes to
>> keep costs down while I test and so forth, so that makes a lot of sense
>> (though I definitely hoped I’d configured appropriate memory ceilings).
>> Here are the values from the YARN UI:
>>
>> Containers Running: 1
>> Memory Used: 1 GB
>>   Memory Total: 1.76 GB
>>   Memory Reserved: 0 B
>>   VCores Used: 1
>>   VCores Total: 8
>>   VCores Reserved: 0
>> Active Nodes: 1
>> Decommissioned Nodes: 0
>> Lost Nodes: 0
>> Unhealthy Nodes: 0
>> Rebooted Nodes: 0
>>
>>
>> Again, much obliged for your response.
>>
>> Andrew Sannier
>>
>>
>>
>> On 3/31/15, 3:54 PM, "Chris Riccomini" <criccom...@apache.org> wrote:
>>
>> >Hey Andrew,
>> >
>> >I'm wondering if your YARN cluster doesn't have enough memory to fit
>>both
>> >the AM and its containers. The fact that the AM UI shows no running
>> >containers is suspicious. Can you check these four settings in your
>>YARN
>> >RM's UI:
>> >
>> >  Memory Used
>> >  Memory Total
>> >  Memory Reserved
>> >  VCores Used
>> >  VCores Total
>> >  VCores Reserved
>> >
>> >Can you also attach (or post to gist/pastebin/etc) the YARN AM's full
>>log?
>> >
>> >Cheers,
>> >Chris
>> >
>> >On Tue, Mar 31, 2015 at 2:32 PM, Andrew Sannier
>> ><asann...@helixeducation.com
>> >> wrote:
>> >
>> >> Something to add here: there are a couple of weird things in the
>>Samza
>> >> Application Master web UI: Application master task ID is -1, which
>>seems
>> >> odd, and the Running Containers table is completely empty. How could
>> >>YARN
>> >> call a task “Running” if there’s no container?
>> >>
>> >> Thanks,
>> >> Andrew Sannier
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 3/31/15, 2:19 PM, "Andrew Sannier" <asann...@helixeducation.com>
>> >>wrote:
>> >>
>> >> >Hi all -
>> >> >
>> >> >Thanks in advance for your help; I have been totally stuck on this
>>for
>> >>a
>> >> >couple of days.
>> >> >
>> >> >I have a small YARN cluster with one ResourceManager and one
>> >>NodeManager
>> >> >as well as one Zookeeper node and one Kafka node - trying to keep
>>the
>> >> >number of moving parts to a minimum. I¹ve been following the guide
>>to
>> >> >running Samza on YARN
>> >>
>> >>>(
>> https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.htm
>> >>>l
>> >> )
>> >> >,
>> >> > and I get to the end of the tutorial with a Running job in the YARN
>> >>web
>> >> >UI, as expected. However, the job doesn¹t actually appear to do
>> >>anything -
>> >> >messages are not produced to the ³wikipedia-raw² topic (nor is the
>> >>topic
>> >> >created), and no data is logged at all.
>> >> >
>> >> >To that point, I am having a ton of trouble with Samza¹s logging -
>>in
>> >> >samza.log.dir on the ResourceManager node, there¹s only
>> >>gc.log.0.current,
>> >> >and in the YARN log directory I have only the resourcemanager log
>> >>which of
>> >> >course contains no application information. On the NodeManager side,
>> >> >samza.log.dir contains application-manager.log, which ends at
>>"[INFO]
>> >> >Requesting 1 container(s) with 1700mb of memory² right after the job
>> >> >enters the Running state, it¹s own copy of gc.log.0.current, and
>>stderr
>> >> >and stdout which contain no useful information and also don¹t grow
>> >>after
>> >> >the first second of the job running. In YARN¹s logs, there¹s only
>>the
>> >>node
>> >> >manager log, which has no errors or warnings and just logs the
>>startup
>> >>of
>> >> >the container and then its memory usage from then on, which seems
>>fine:
>> >> >
>> >> >2015-03-31 20:17:34,635 INFO  [Container Monitor]
>> >> >monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408))
>>-
>> >> >Memory usage of ProcessTree 25767 for container-id
>> >> >container_1427823389325_0011_01_000001: 104.9 MB of 1 GB physical
>> >>memory
>> >> >used; 2.4 GB of 3.1 GB virtual memory used
>> >> >
>> >> >
>> >> >What am I missing here? WikipediaFeed.java contains a whole bunch of
>> >> >logging statements, but nothing ever hits any file I can find. Even
>>if
>> >>you
>> >> >can¹t help with the problem I¹m having with hello-samza, I would
>> >>greatly
>> >> >appreciate any advice on how I can get useful logs from Samza jobs.
>> >> >
>> >> >I¹ve checked that I can ping the Wikipedia IRC URL and consume
>> >> >from/produce to the Kafka cluster with the console shell scripts
>>from
>> >>both
>> >> >the ResourceManager and NodeManager nodes, and other applications
>>can
>> >>work
>> >> >with my Kafka and Zookeeper with no issues. From the
>>application-master
>> >> >log on the worker node, all I can see is that Samza configures the
>> >> >Wikipedia IRC system, starts the Webapp, and requests a container.
>>It
>> >> >enters the Running state with YARN, after which point nothing
>>happens
>> >>at
>> >> >all. There¹s no activity at all in the Kafka or Zookeeper logs.
>> >> >
>> >> >And that¹s it; the job will run for hours if I let it but at no
>>point
>> >>is
>> >> >anything produced to Kafka or logged at all. I wrote a simpler task
>> >>that
>> >> >just accepts a json message from a topic on Kafka, adds a timestamp,
>> >>and
>> >> >produces to another topic, but almost nothing is different. From
>> >> >application-master log:
>> >> >
>> >> >2015-03-31 20:07:05 ClientUtils$ [INFO] Fetching metadata from
>>broker
>> >> >id:0,host:172.31.2.19,port:9092 with correlation id 0 for 1 topic(s)
>> >> >Set(test)
>> >> >2015-03-31 20:07:05 SyncProducer [INFO] Connected to
>>172.31.2.19:9092
>> >>for
>> >> >producing
>> >> >2015-03-31 20:07:05 SyncProducer [INFO] Disconnecting from
>> >> >172.31.2.19:9092
>> >> >2015-03-31 20:07:06 KafkaSystemAdmin$ [INFO] Got metadata: Map(test
>>->
>> >> >SystemStreamMetadata [streamName=test, partitionMetadata={Partition
>> >> >[partition=0]=SystemStreamPartitionMetadata [oldestOffset=0,
>> >> >newestOffset=4, upcomingOffset=5], Partition
>> >> >[partition=1]=SystemStreamPartitionMetadata [oldestOffset=null,
>> >> >newestOffset=null, upcomingOffset=0]}])
>> >> >
>> >> >
>> >> >which all looks correct. Then it connects to ResourceManager, starts
>> >>the
>> >> >Webapp, Requests a container and starts running. All I see in
>>Kafka¹s
>> >>log
>> >> >is
>> >> >
>> >> >[2015-03-31 20:07:05,999] INFO Closing socket connection to
>> >>/172.31.1.229
>> >> .
>> >> >(kafka.network.Processor)
>> >> >[2015-03-31 20:07:06,090] INFO Closing socket connection to
>> >>/172.31.1.229
>> >> .
>> >> >(kafka.network.Processor)
>> >> >
>> >> >
>> >> >and Zookeeper has nothing to say at all. As before, no new topic is
>> >> >created.
>> >> >
>> >> >So a huge part of this question is just, what am I missing about
>> >>logging?
>> >> >Where are the actual job/task-level logs? Aside from that, I just
>>have
>> >>no
>> >> >explanation for why nothing is happening in either of these simple
>> >>tasks.
>> >> >I would really appreciate any insight anyone can offerŠ
>> >> >
>> >> >Oh, one more thing - there was an error message in Zookeeper after
>> >> >submitting my simple StreamTask that I haven¹t been able to
>>reproduce:
>> >> >
>> >> >2015-03-31 19:48:28,145 [myid:] - INFO
>> >> 
>>>[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
>> >> >Accepted socket connection from /172.31.2.19:41801
>> >> >2015-03-31 19:48:28,147 [myid:] - WARN
>> >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] -
>> >> >Connection request from old client /172.31.2.19:41801; will be
>>dropped
>> >>if
>> >> >server is in r-o mode
>> >> >2015-03-31 19:48:28,148 [myid:] - INFO
>> >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] -
>> >>Client
>> >> >attempting to establish new session at /172.31.2.19:41801
>> >> >2015-03-31 19:48:28,149 [myid:] - INFO
>> >>[SyncThread:0:ZooKeeperServer@617
>> >> ]
>> >> >- Established session 0x14c70bd0c3e0006 with negotiated timeout
>>30000
>> >>for
>> >> >client /172.31.2.19:41801
>> >> >2015-03-31 19:48:28,202 [myid:] - INFO  [ProcessThread(sid:0
>> >> >cport:-1)::PrepRequestProcessor@494] - Processed session termination
>> >>for
>> >> >sessionid: 0x14c70bd0c3e0006
>> >> >2015-03-31 19:48:28,206 [myid:] - INFO
>> >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] -
>> Closed
>> >> >socket connection for client /172.31.2.19:41801 which had sessionid
>> >> >0x14c70bd0c3e0006
>> >> >
>> >> >
>> >> >172.31.2.19 is the Kafka broker. The job continued unphased; Samza
>> >>didn¹t
>> >> >log anything about this socket being closed or any kind of error.
>>Not
>> >>sure
>> >> >if that¹s related.
>> >> >
>> >> >
>> >> >Again, thanks a ton for reading and whatever help you can offer.
>> >> >
>> >> >Andrew Sannier
>> >> >Software Engineer, Big Data
>> >> >C: 480-284-1048
>> >> >www.helixeducation.com
>> >> >
>> >> >
>> >>
>> >>
>>
>>

Reply via email to