Hi, Jordi, Can you post your task.opts settings as well? The Xms and Xmx JVM opts will play a role here as well. The Xmx size should be set to less than yarn.container.memory.mb.
-Yi On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri <jbl...@nextel.es> wrote: > I am seeing that I can not get even a single job running. I have recovered > the original configuration of yarn-site.xml and capacity-scheduler.xml and > that does not work. I am thinking that maybe there is some kind of > information related to old jobs that have not been correctly cleaned when > killing them. Is there any place where I can look to remove temporary files > or something similar? > > Thanks > > jordi > > -----Mensaje original----- > De: Jordi Blasi Uribarri [mailto:jbl...@nextel.es] > Enviado el: martes, 22 de septiembre de 2015 10:06 > Para: dev@samza.apache.org > Asunto: container is running beyond virtual memory limits > > Hi, > > I am not really sure If this is related to any of the previous questions > so I am asking it in a new message. I am running three different samza jobs > that perform different actions and interchange information. As I found > limits in the memory that were preventing the jobs to get from Accepted to > Running I introduced some configurations in Yarn, as suggested in this list: > > > yarn-site.xml > > <configuration> > <property> > <name>yarn.scheduler.minimum-allocation-mb</name> > <value>128</value> > <description>Minimum limit of memory to allocate to each container > request at the Resource Manager.</description> > </property> > <property> > <name>yarn.scheduler.maximum-allocation-mb</name> > <value>512</value> > <description>Maximum limit of memory to allocate to each container > request at the Resource Manager.</description> > </property> > <property> > <name>yarn.scheduler.minimum-allocation-vcores</name> > <value>1</value> > <description>The minimum allocation for every container request at the > RM, in terms of virtual CPU cores. Requests lower than this won't take > effect, and the specified value will get allocated the > minimum.</description> > </property> > <property> > <name>yarn.scheduler.maximum-allocation-vcores</name> > <value>2</value> > <description>The maximum allocation for every container request at the > RM, in terms of virtual CPU cores. Requests higher than this won't take > effect, and will get capped to this value.</description> > </property> > <property> > <name>yarn.resourcemanager.hostname</name> > <value>kfk-samza01</value> > </property> > </configuration> > > capacity-scheduler.xml > Alter value > <property> > <name>yarn.scheduler.capacity.maximum-am-resource-percent</name> > <value>0.5</value> > <description> > Maximum percent of resources in the cluster which can be used to run > application masters i.e. controls number of concurrent running > applications. > </description> > </property> > > The jobs are configured to reduce the memory usage: > > yarn.container.memory.mb=256 > yarn.am.container.memory.mb=256 > > After introducing these changes I experienced a very appreciable reduction > of the speed. It seemed normal as the memory assigned to the jobs was > lowered and there were more running. It was running until yesterday but > today I am seeing that > > What I have seen today is that they are not moving from ACCEPTED to > RUNNING. I have found the following in the log (full log at the end): > > 2015-09-22 09:54:36,661 INFO [Container Monitor] > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - > Memory usage of ProcessTree 10346 for container-id > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory > used; 1.2 GB of 537.6 MB virtual memory used > > I am not sure where that 1.2 Gb comes from and makes the processes dye. > > Thanks, > > Jordi > > > > > 2015-09-22 09:54:36,519 INFO [Container Monitor] > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - > Removed ProcessTree with root 10271 > 2015-09-22 09:54:36,519 INFO [AsyncDispatcher event handler] > container.Container (ContainerImpl.java:handle(999)) - Container > container_1442908447829_0002_01_000001 transitioned from RUNNING to KILLING > 2015-09-22 09:54:36,533 INFO [AsyncDispatcher event handler] > launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) - > Cleaning up container container_1442908447829_0002_01_000001 > 2015-09-22 09:54:36,661 INFO [Container Monitor] > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - > Memory usage of ProcessTree 10346 for container-id > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory > used; 1.2 GB of 537.6 MB virtual memory used > 2015-09-22 09:54:36,661 WARN [Container Monitor] > monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process tree for > container: container_1442908447829_0001_01_000001 running over twice the > configured limit. Limit=563714432, current usage = 1269743616 > 2015-09-22 09:54:36,662 WARN [Container Monitor] > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) - > Container [pid=10346,containerID=container_1442908447829_0001_01_000001] is > running beyond virtual memory limits. Current usage: 70.0 MB of 256 MB > physical memory used; 1.2 GB of 537.6 MB virtual memory used. Killing > container. > Dump of the process-tree for container_1442908447829_0001_01_000001 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908 > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server > -Dsamza.container.name=samza-application-master > -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001 > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/tmp > -Xmx768M -XX:+PrintGCDateStamps > -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001/gc.log > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 > -XX:GCLogFileSize=10241024 -d64 -cp > /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxrs-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar > org.apache.samza.job.yarn.SamzaAppMaster > > 2015-09-22 09:54:36,663 INFO [Container Monitor] > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - > Removed ProcessTree with root 10346 > 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler] > container.Container (ContainerImpl.java:handle(999)) - Container > container_1442908447829_0001_01_000001 transitioned from RUNNING to KILLING > 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler] > launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) - > Cleaning up container container_1442908447829_0001_01_000001 > ________________________________ > Jordi Blasi Uribarri > Área I+D+i > > jbl...@nextel.es > Oficina Bilbao > > [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png] > ________________________________ > Jordi Blasi Uribarri >