Hi, Jordi, Please find the config variable task.opts in this table: http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html
This allows you to add additional JVM opts when launching the containers. -Yi On Mon, Sep 28, 2015 at 1:48 AM, Jordi Blasi Uribarri <jbl...@nextel.es> wrote: > The three tasks have a similar options file, like this one. > > task.class=flow.OperationJob > job.name=flow.OperationJob > job.factory.class=org.apache.samza.job.yarn.YarnJobFactory > yarn.package.path=http://IP/javaapp.tar.gz > > > systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory > systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02:2181 > > systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:9093 > > systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:909 > > > task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory > task.checkpoint.system=kafka > task.inputs=kafka.operationtpc > > > serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory > > serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory > > systems.kafka.samza.msg.serde=string > systems.kafka.streams.tracetpc.samza.msg.serde=json > > yarn.container.memory.mb=256 > yarn.am.container.memory.mb=256 > > task.commit.ms=1000 > task.window.ms=60000 > > Where do I have to change the XMX parameter? > > Thanks. > > Jordi > > > -----Mensaje original----- > De: Yi Pan [mailto:nickpa...@gmail.com] > Enviado el: lunes, 28 de septiembre de 2015 10:39 > Para: dev@samza.apache.org > Asunto: Re: container is running beyond virtual memory limits > > Hi, Jordi, > > Can you post your task.opts settings as well? The Xms and Xmx JVM opts > will play a role here as well. The Xmx size should be set to less than > yarn.container.memory.mb. > > -Yi > > On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri <jbl...@nextel.es> > wrote: > > > I am seeing that I can not get even a single job running. I have > > recovered the original configuration of yarn-site.xml and > > capacity-scheduler.xml and that does not work. I am thinking that > > maybe there is some kind of information related to old jobs that have > > not been correctly cleaned when killing them. Is there any place where > > I can look to remove temporary files or something similar? > > > > Thanks > > > > jordi > > > > -----Mensaje original----- > > De: Jordi Blasi Uribarri [mailto:jbl...@nextel.es] Enviado el: martes, > > 22 de septiembre de 2015 10:06 > > Para: dev@samza.apache.org > > Asunto: container is running beyond virtual memory limits > > > > Hi, > > > > I am not really sure If this is related to any of the previous > > questions so I am asking it in a new message. I am running three > > different samza jobs that perform different actions and interchange > > information. As I found limits in the memory that were preventing the > > jobs to get from Accepted to Running I introduced some configurations in > Yarn, as suggested in this list: > > > > > > yarn-site.xml > > > > <configuration> > > <property> > > <name>yarn.scheduler.minimum-allocation-mb</name> > > <value>128</value> > > <description>Minimum limit of memory to allocate to each container > > request at the Resource Manager.</description> > > </property> > > <property> > > <name>yarn.scheduler.maximum-allocation-mb</name> > > <value>512</value> > > <description>Maximum limit of memory to allocate to each container > > request at the Resource Manager.</description> > > </property> > > <property> > > <name>yarn.scheduler.minimum-allocation-vcores</name> > > <value>1</value> > > <description>The minimum allocation for every container request at > > the RM, in terms of virtual CPU cores. Requests lower than this won't > > take effect, and the specified value will get allocated the > > minimum.</description> > > </property> > > <property> > > <name>yarn.scheduler.maximum-allocation-vcores</name> > > <value>2</value> > > <description>The maximum allocation for every container request at > > the RM, in terms of virtual CPU cores. Requests higher than this won't > > take effect, and will get capped to this value.</description> > > </property> > > <property> > > <name>yarn.resourcemanager.hostname</name> > > <value>kfk-samza01</value> > > </property> > > </configuration> > > > > capacity-scheduler.xml > > Alter value > > <property> > > <name>yarn.scheduler.capacity.maximum-am-resource-percent</name> > > <value>0.5</value> > > <description> > > Maximum percent of resources in the cluster which can be used to > run > > application masters i.e. controls number of concurrent running > > applications. > > </description> > > </property> > > > > The jobs are configured to reduce the memory usage: > > > > yarn.container.memory.mb=256 > > yarn.am.container.memory.mb=256 > > > > After introducing these changes I experienced a very appreciable > > reduction of the speed. It seemed normal as the memory assigned to the > > jobs was lowered and there were more running. It was running until > > yesterday but today I am seeing that > > > > What I have seen today is that they are not moving from ACCEPTED to > > RUNNING. I have found the following in the log (full log at the end): > > > > 2015-09-22 09:54:36,661 INFO [Container Monitor] > > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - > > Memory usage of ProcessTree 10346 for container-id > > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical > > memory used; 1.2 GB of 537.6 MB virtual memory used > > > > I am not sure where that 1.2 Gb comes from and makes the processes dye. > > > > Thanks, > > > > Jordi > > > > > > > > > > 2015-09-22 09:54:36,519 INFO [Container Monitor] > > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - > > Removed ProcessTree with root 10271 > > 2015-09-22 09:54:36,519 INFO [AsyncDispatcher event handler] > > container.Container (ContainerImpl.java:handle(999)) - Container > > container_1442908447829_0002_01_000001 transitioned from RUNNING to > > KILLING > > 2015-09-22 09:54:36,533 INFO [AsyncDispatcher event handler] > > launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) > > - Cleaning up container container_1442908447829_0002_01_000001 > > 2015-09-22 09:54:36,661 INFO [Container Monitor] > > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - > > Memory usage of ProcessTree 10346 for container-id > > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical > > memory used; 1.2 GB of 537.6 MB virtual memory used > > 2015-09-22 09:54:36,661 WARN [Container Monitor] > > monitor.ContainersMonitorImpl > > (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process > > tree for > > container: container_1442908447829_0001_01_000001 running over twice > > the configured limit. Limit=563714432, current usage = 1269743616 > > 2015-09-22 09:54:36,662 WARN [Container Monitor] > > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) - > > Container > > [pid=10346,containerID=container_1442908447829_0001_01_000001] is > > running beyond virtual memory limits. Current usage: 70.0 MB of 256 MB > > physical memory used; 1.2 GB of 537.6 MB virtual memory used. Killing > container. > > Dump of the process-tree for container_1442908447829_0001_01_000001 : > > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > > |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908 > > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server > > -Dsamza.container.name=samza-application-master > > -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447 > > 829_0001/container_1442908447829_0001_01_000001 > > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache > > /application_1442908447829_0001/container_1442908447829_0001_01_000001 > > /__package/tmp > > -Xmx768M -XX:+PrintGCDateStamps > > -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001 > > /container_1442908447829_0001_01_000001/gc.log > > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 > > -XX:GCLogFileSize=10241024 -d64 -cp > > /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/ap > > pcache/application_1442908447829_0001/container_1442908447829_0001_01_ > > 000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm > > -local-dir/usercache/root/appcache/application_1442908447829_0001/cont > > ainer_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.ja > > r:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_14 > > 42908447829_0001/container_1442908447829_0001_01_000001/__package/lib/ > > jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/roo > > t/appcache/application_1442908447829_0001/container_1442908447829_0001 > > _01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoo > > p-root/nm-local-dir/usercache/root/appcache/application_1442908447829_ > > 0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxr > > s-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root > > /appcache/application_1442908447829_0001/container_1442908447829_0001_ > > 01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp > > /hadoop-root/nm-local-dir/usercache/root/appcache/application_14429084 > > 47829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBro > > ker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/ap > > plication_1442908447829_0001/container_1442908447829_0001_01_000001/__ > > package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar > > org.apache.samza.job.yarn.SamzaAppMaster > > > > 2015-09-22 09:54:36,663 INFO [Container Monitor] > > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - > > Removed ProcessTree with root 10346 > > 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler] > > container.Container (ContainerImpl.java:handle(999)) - Container > > container_1442908447829_0001_01_000001 transitioned from RUNNING to > > KILLING > > 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler] > > launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370)) > > - Cleaning up container container_1442908447829_0001_01_000001 > > ________________________________ > > Jordi Blasi Uribarri > > Área I+D+i > > > > jbl...@nextel.es > > Oficina Bilbao > > > > [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png] > > ________________________________ > > Jordi Blasi Uribarri > > >