Hi, I am not really sure If this is related to any of the previous questions so I am asking it in a new message. I am running three different samza jobs that perform different actions and interchange information. As I found limits in the memory that were preventing the jobs to get from Accepted to Running I introduced some configurations in Yarn, as suggested in this list:
yarn-site.xml <configuration> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>128</value> <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>512</value> <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>2</value> <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>kfk-samza01</value> </property> </configuration> capacity-scheduler.xml Alter value <property> <name>yarn.scheduler.capacity.maximum-am-resource-percent</name> <value>0.5</value> <description> Maximum percent of resources in the cluster which can be used to run application masters i.e. controls number of concurrent running applications. </description> </property> The jobs are configured to reduce the memory usage: yarn.container.memory.mb=256 yarn.am.container.memory.mb=256 After introducing these changes I experienced a very appreciable reduction of the speed. It seemed normal as the memory assigned to the jobs was lowered and there were more running. It was running until yesterday but today I am seeing that What I have seen today is that they are not moving from ACCEPTED to RUNNING. I have found the following in the log (full log at the end): 2015-09-22 09:54:36,661 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 10346 for container-id container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory used; 1.2 GB of 537.6 MB virtual memory used I am not sure where that 1.2 Gb comes from and makes the processes dye. Thanks, Jordi 2015-09-22 09:54:36,519 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - Removed ProcessTree with root 10271 2015-09-22 09:54:36,519 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(999)) - Container container_1442908447829_0002_01_000001 transitioned from RUNNING to KILLING 2015-09-22 09:54:36,533 INFO [AsyncDispatcher event handler] launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) - Cleaning up container container_1442908447829_0002_01_000001 2015-09-22 09:54:36,661 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 10346 for container-id container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical memory used; 1.2 GB of 537.6 MB virtual memory used 2015-09-22 09:54:36,661 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process tree for container: container_1442908447829_0001_01_000001 running over twice the configured limit. Limit=563714432, current usage = 1269743616 2015-09-22 09:54:36,662 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) - Container [pid=10346,containerID=container_1442908447829_0001_01_000001] is running beyond virtual memory limits. Current usage: 70.0 MB of 256 MB physical memory used; 1.2 GB of 537.6 MB virtual memory used. Killing container. Dump of the process-tree for container_1442908447829_0001_01_000001 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -Dsamza.container.name=samza-application-master -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001 -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/tmp -Xmx768M -XX:+PrintGCDateStamps -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001/container_1442908447829_0001_01_000001/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10241024 -d64 -cp /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxrs-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1442908447829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar org.apache.samza.job.yarn.SamzaAppMaster 2015-09-22 09:54:36,663 INFO [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - Removed ProcessTree with root 10346 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(999)) - Container container_1442908447829_0001_01_000001 transitioned from RUNNING to KILLING 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler] launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) - Cleaning up container container_1442908447829_0001_01_000001 ________________________________ Jordi Blasi Uribarri Área I+D+i jbl...@nextel.es Oficina Bilbao [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]