The three tasks have a similar options file, like this one.
task.class=flow.OperationJob
job.name=flow.OperationJob
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
yarn.package.path=http://IP/javaapp.tar.gz
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02:2181
systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:9093
systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:909
task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.system=kafka
task.inputs=kafka.operationtpc
serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
systems.kafka.samza.msg.serde=string
systems.kafka.streams.tracetpc.samza.msg.serde=json
yarn.container.memory.mb=256
yarn.am.container.memory.mb=256
task.commit.ms=1000
task.window.ms=60000
Where do I have to change the XMX parameter?
Thanks.
Jordi
-----Mensaje original-----
De: Yi Pan [mailto:[email protected]]
Enviado el: lunes, 28 de septiembre de 2015 10:39
Para: [email protected]
Asunto: Re: container is running beyond virtual memory limits
Hi, Jordi,
Can you post your task.opts settings as well? The Xms and Xmx JVM opts will
play a role here as well. The Xmx size should be set to less than
yarn.container.memory.mb.
-Yi
On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri <[email protected]>
wrote:
> I am seeing that I can not get even a single job running. I have
> recovered the original configuration of yarn-site.xml and
> capacity-scheduler.xml and that does not work. I am thinking that
> maybe there is some kind of information related to old jobs that have
> not been correctly cleaned when killing them. Is there any place where
> I can look to remove temporary files or something similar?
>
> Thanks
>
> jordi
>
> -----Mensaje original-----
> De: Jordi Blasi Uribarri [mailto:[email protected]] Enviado el: martes,
> 22 de septiembre de 2015 10:06
> Para: [email protected]
> Asunto: container is running beyond virtual memory limits
>
> Hi,
>
> I am not really sure If this is related to any of the previous
> questions so I am asking it in a new message. I am running three
> different samza jobs that perform different actions and interchange
> information. As I found limits in the memory that were preventing the
> jobs to get from Accepted to Running I introduced some configurations in
> Yarn, as suggested in this list:
>
>
> yarn-site.xml
>
> <configuration>
> <property>
> <name>yarn.scheduler.minimum-allocation-mb</name>
> <value>128</value>
> <description>Minimum limit of memory to allocate to each container
> request at the Resource Manager.</description>
> </property>
> <property>
> <name>yarn.scheduler.maximum-allocation-mb</name>
> <value>512</value>
> <description>Maximum limit of memory to allocate to each container
> request at the Resource Manager.</description>
> </property>
> <property>
> <name>yarn.scheduler.minimum-allocation-vcores</name>
> <value>1</value>
> <description>The minimum allocation for every container request at
> the RM, in terms of virtual CPU cores. Requests lower than this won't
> take effect, and the specified value will get allocated the
> minimum.</description>
> </property>
> <property>
> <name>yarn.scheduler.maximum-allocation-vcores</name>
> <value>2</value>
> <description>The maximum allocation for every container request at
> the RM, in terms of virtual CPU cores. Requests higher than this won't
> take effect, and will get capped to this value.</description>
> </property>
> <property>
> <name>yarn.resourcemanager.hostname</name>
> <value>kfk-samza01</value>
> </property>
> </configuration>
>
> capacity-scheduler.xml
> Alter value
> <property>
> <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
> <value>0.5</value>
> <description>
> Maximum percent of resources in the cluster which can be used to run
> application masters i.e. controls number of concurrent running
> applications.
> </description>
> </property>
>
> The jobs are configured to reduce the memory usage:
>
> yarn.container.memory.mb=256
> yarn.am.container.memory.mb=256
>
> After introducing these changes I experienced a very appreciable
> reduction of the speed. It seemed normal as the memory assigned to the
> jobs was lowered and there were more running. It was running until
> yesterday but today I am seeing that
>
> What I have seen today is that they are not moving from ACCEPTED to
> RUNNING. I have found the following in the log (full log at the end):
>
> 2015-09-22 09:54:36,661 INFO [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> Memory usage of ProcessTree 10346 for container-id
> container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical
> memory used; 1.2 GB of 537.6 MB virtual memory used
>
> I am not sure where that 1.2 Gb comes from and makes the processes dye.
>
> Thanks,
>
> Jordi
>
>
>
>
> 2015-09-22 09:54:36,519 INFO [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) -
> Removed ProcessTree with root 10271
> 2015-09-22 09:54:36,519 INFO [AsyncDispatcher event handler]
> container.Container (ContainerImpl.java:handle(999)) - Container
> container_1442908447829_0002_01_000001 transitioned from RUNNING to
> KILLING
> 2015-09-22 09:54:36,533 INFO [AsyncDispatcher event handler]
> launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370))
> - Cleaning up container container_1442908447829_0002_01_000001
> 2015-09-22 09:54:36,661 INFO [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> Memory usage of ProcessTree 10346 for container-id
> container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical
> memory used; 1.2 GB of 537.6 MB virtual memory used
> 2015-09-22 09:54:36,661 WARN [Container Monitor]
> monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process
> tree for
> container: container_1442908447829_0001_01_000001 running over twice
> the configured limit. Limit=563714432, current usage = 1269743616
> 2015-09-22 09:54:36,662 WARN [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447)) -
> Container
> [pid=10346,containerID=container_1442908447829_0001_01_000001] is
> running beyond virtual memory limits. Current usage: 70.0 MB of 256 MB
> physical memory used; 1.2 GB of 537.6 MB virtual memory used. Killing
> container.
> Dump of the process-tree for container_1442908447829_0001_01_000001 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908
> /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server
> -Dsamza.container.name=samza-application-master
> -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1442908447
> 829_0001/container_1442908447829_0001_01_000001
> -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache
> /application_1442908447829_0001/container_1442908447829_0001_01_000001
> /__package/tmp
> -Xmx768M -XX:+PrintGCDateStamps
> -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_0001
> /container_1442908447829_0001_01_000001/gc.log
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=10241024 -d64 -cp
> /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/ap
> pcache/application_1442908447829_0001/container_1442908447829_0001_01_
> 000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm
> -local-dir/usercache/root/appcache/application_1442908447829_0001/cont
> ainer_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.ja
> r:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_14
> 42908447829_0001/container_1442908447829_0001_01_000001/__package/lib/
> jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/roo
> t/appcache/application_1442908447829_0001/container_1442908447829_0001
> _01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoo
> p-root/nm-local-dir/usercache/root/appcache/application_1442908447829_
> 0001/container_1442908447829_0001_01_000001/__package/lib/jackson-jaxr
> s-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root
> /appcache/application_1442908447829_0001/container_1442908447829_0001_
> 01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/tmp
> /hadoop-root/nm-local-dir/usercache/root/appcache/application_14429084
> 47829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtBro
> ker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/ap
> plication_1442908447829_0001/container_1442908447829_0001_01_000001/__
> package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar
> org.apache.samza.job.yarn.SamzaAppMaster
>
> 2015-09-22 09:54:36,663 INFO [Container Monitor]
> monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) -
> Removed ProcessTree with root 10346
> 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler]
> container.Container (ContainerImpl.java:handle(999)) - Container
> container_1442908447829_0001_01_000001 transitioned from RUNNING to
> KILLING
> 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler]
> launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(370))
> - Cleaning up container container_1442908447829_0001_01_000001
> ________________________________
> Jordi Blasi Uribarri
> Área I+D+i
>
> [email protected]
> Oficina Bilbao
>
> [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
> ________________________________
> Jordi Blasi Uribarri
>