Just to give an as complete view of my situation I am compiling what I have
done and what my problem is, so maybe you have the most complete information.
What I have done is the following in two virtual machines, with 4 cores and 4gb
ram each.
Install Debian 7.8. Plain with no graphical interface.
apt-get install openjdk-7-jdk openjdk-7-jre git maven curl
git clone http://git-wip-us.apache.org/repos/asf/samza.git
gradlew clean build
As there was a bug in the Keyrocks testing script I just commented the code in
the TestTTL script.
wget
http://apache.rediris.es/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
tar -xvf hadoop-2.6.0.tar.gz
vi conf/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>kfk-samza01</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>3</value>
</property>
</configuration>
cp ./etc/hadoop/capacity-scheduler.xml conf
vi $HADOOP_YARN_HOME/conf/core-site.xml
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.http.impl</name>
<value>org.apache.samza.util.hadoop.HttpFileSystem</value>
</property>
</configuration>
curl http://www.scala-lang.org/files/archive/scala-2.10.4.tgz >
scala-2.10.4.tgz
tar -xvf scala-2.10.4.tgz
cp /tmp/scala-2.10.4/lib/scala-compiler.jar
$HADOOP_YARN_HOME/share/hadoop/hdfs/lib
cp /tmp/scala-2.10.4/lib/scala-library.jar
$HADOOP_YARN_HOME/share/hadoop/hdfs/lib
curl -L
http://search.maven.org/remotecontent?filepath=org/clapper/grizzled-slf4j_2.10/1.0.1/grizzled-
slf4j_2.10-1.0.1.jar >
$HADOOP_YARN_HOME/share/hadoop/hdfs/lib/grizzled-slf4j_2.10-1.0.1.jar
curl -L
http://search.maven.org/remotecontent?filepath=org/apache/samza/samza-yarn_2.10/0.9.1/samza-
yarn_2.10-0.9.1.jar >
$HADOOP_YARN_HOME/share/hadoop/hdfs/lib/samza-yarn_2.10-0.9.1.jar
curl -L
http://search.maven.org/remotecontent?filepath=org/apache/samza/samza-core_2.10/0.9.1/samza-
core_2.10-0.9.1.jar >
$HADOOP_YARN_HOME/share/hadoop/hdfs/lib/samza-core_2.10-0.9.1.jar
cd /opt/hadoop-2.6.0/
scp -r . 192.168.15.94:/opt/hadoop-2.6.0
echo 192.168.15.92 >> conf/slaves
echo 192.168.15.94 >> conf/slaves
sbin/start-yarn.sh
I have copied in the /opt/jobs/bin all the scrips in the
/opt/samza/samza-shell/src/main/bash/ folder.
I have generated an eclipse project with the samza dependencies included, via
Maven, and no jobs, package it and copy to /opt/jobs/lib.
I have generated an eclipse project with the samza dependencies included, via
Maven, and three jobs that implement StreamTask and initiableTask. The
functions are empty, for testing purposes. It is published in a folder
published through apache web server.
I have created the associated job options file in the /opt/job/dtan folder like
this:
task.class=flow.WorkFlow
job.name=flow.WorkFlow
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
yarn.package.path=http://192.168.15.92/jobs/DataAnalyzer-0.0.1-bin.tar.gz
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02:2181
systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:9093
systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka01:9093,kfk-kafka02:9092,kfk-kafka02:909
task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.system=kafka
task.inputs=kafka.flowtpc
serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
systems.kafka.samza.msg.serde=string
systems.kafka.streams.tracetpc.samza.msg.serde=json
yarn.container.memory.mb=256
yarn.am.container.memory.mb=256
task.opts= -Xms128M -Xmx128M
task.commit.ms=100
What I see:
• If I launch the three jobs, Only one of them gets to running
state. The one called Router. I it is always the same one. The others stay in
Accepted until they are killed by the system. I have seen these error:
o Container
[pid=23007,containerID=container_1443454508386_0003_01_000001] is running
beyond virtual memory limits. Current usage: 13.9 MB of 256 MB physical memory
used; 1.1 GB of 537.6 MB virtual memory used. Killing container
• When I kill the jobs with the kill-yarn-job.sh script the java
process does not get killed.
• Although I have set in the options that the job should be
launched with -Xms128M -Xmx128M I see that it runs with -Xmx768M. I have even
changed the run-class.sh script but it does not change.
Some things that I am describing do not make sense for me, so I am lost on what
to do or where to look.
Thanks for your help,
Jordi
-----Mensaje original-----
De: Jordi Blasi Uribarri [mailto:[email protected]]
Enviado el: lunes, 28 de septiembre de 2015 11:26
Para: [email protected]
Asunto: RE: container is running beyond virtual memory limits
I just changed the task options file to add the following line:
task.opts=-Xmx128M
And I found no change on the behaivour. I see that the job is being launched
with the default -Xmx768M value:
root 8296 8294 1 11:16 ? 00:00:05
/usr/lib/jvm/java-7-openjdk-amd64/bin/java -server
-Dsamza.container.name=samza-application-master
-Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_1443431699703_0003/container_1443431699703_0003_01_000001
-Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/tmp
-Xmx768M -XX:+PrintGCDateStamps
-Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1443431699703_0003/container_1443431699703_0003_01_000001/gc.log
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10241024
-d64 -cp
/opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/DataAnalyzer-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/DataAnalyzer-0.0.1-jar-with-dependencies.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-core-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-jaxrs-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1443431699703_0003/container_1443431699703_0003_01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar
org.apache.samza.job.yarn.SamzaAppMaster
How do I set the correct value?
Thanks,
Jordi
-----Mensaje original-----
De: Yi Pan [mailto:[email protected]] Enviado el: lunes, 28 de septiembre de
2015 10:56
Para: [email protected]
Asunto: Re: container is running beyond virtual memory limits
Hi, Jordi,
Please find the config variable task.opts in this table:
http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html
This allows you to add additional JVM opts when launching the containers.
-Yi
On Mon, Sep 28, 2015 at 1:48 AM, Jordi Blasi Uribarri <[email protected]>
wrote:
> The three tasks have a similar options file, like this one.
>
> task.class=flow.OperationJob
> job.name=flow.OperationJob
> job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
> yarn.package.path=http://IP/javaapp.tar.gz
>
>
> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemF
> actory
> systems.kafka.consumer.zookeeper.connect=kfk-kafka01:2181,kfk-kafka02:
> 2181
>
> systems.kafka.producer.bootstrap.servers=kfk-kafka01:9092,kfk-kafka01:
> 9093,kfk-kafka02:9092,kfk-kafka02:9093
>
> systems.kafka.producer.metadata.broker.list=kfk-kafka01:9092,kfk-kafka
> 01:9093,kfk-kafka02:9092,kfk-kafka02:909
>
>
> task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpo
> intManagerFactory
> task.checkpoint.system=kafka
> task.inputs=kafka.operationtpc
>
>
> serializers.registry.json.class=org.apache.samza.serializers.JsonSerde
> Factory
>
> serializers.registry.string.class=org.apache.samza.serializers.StringS
> erdeFactory
>
> systems.kafka.samza.msg.serde=string
> systems.kafka.streams.tracetpc.samza.msg.serde=json
>
> yarn.container.memory.mb=256
> yarn.am.container.memory.mb=256
>
> task.commit.ms=1000
> task.window.ms=60000
>
> Where do I have to change the XMX parameter?
>
> Thanks.
>
> Jordi
>
>
> -----Mensaje original-----
> De: Yi Pan [mailto:[email protected]] Enviado el: lunes, 28 de
> septiembre de 2015 10:39
> Para: [email protected]
> Asunto: Re: container is running beyond virtual memory limits
>
> Hi, Jordi,
>
> Can you post your task.opts settings as well? The Xms and Xmx JVM opts
> will play a role here as well. The Xmx size should be set to less than
> yarn.container.memory.mb.
>
> -Yi
>
> On Tue, Sep 22, 2015 at 4:32 AM, Jordi Blasi Uribarri
> <[email protected]>
> wrote:
>
> > I am seeing that I can not get even a single job running. I have
> > recovered the original configuration of yarn-site.xml and
> > capacity-scheduler.xml and that does not work. I am thinking that
> > maybe there is some kind of information related to old jobs that
> > have not been correctly cleaned when killing them. Is there any
> > place where I can look to remove temporary files or something similar?
> >
> > Thanks
> >
> > jordi
> >
> > -----Mensaje original-----
> > De: Jordi Blasi Uribarri [mailto:[email protected]] Enviado el:
> > martes,
> > 22 de septiembre de 2015 10:06
> > Para: [email protected]
> > Asunto: container is running beyond virtual memory limits
> >
> > Hi,
> >
> > I am not really sure If this is related to any of the previous
> > questions so I am asking it in a new message. I am running three
> > different samza jobs that perform different actions and interchange
> > information. As I found limits in the memory that were preventing
> > the jobs to get from Accepted to Running I introduced some
> > configurations in
> Yarn, as suggested in this list:
> >
> >
> > yarn-site.xml
> >
> > <configuration>
> > <property>
> > <name>yarn.scheduler.minimum-allocation-mb</name>
> > <value>128</value>
> > <description>Minimum limit of memory to allocate to each
> > container request at the Resource Manager.</description>
> > </property>
> > <property>
> > <name>yarn.scheduler.maximum-allocation-mb</name>
> > <value>512</value>
> > <description>Maximum limit of memory to allocate to each
> > container request at the Resource Manager.</description>
> > </property>
> > <property>
> > <name>yarn.scheduler.minimum-allocation-vcores</name>
> > <value>1</value>
> > <description>The minimum allocation for every container request
> > at the RM, in terms of virtual CPU cores. Requests lower than this
> > won't take effect, and the specified value will get allocated the
> > minimum.</description>
> > </property>
> > <property>
> > <name>yarn.scheduler.maximum-allocation-vcores</name>
> > <value>2</value>
> > <description>The maximum allocation for every container request
> > at the RM, in terms of virtual CPU cores. Requests higher than this
> > won't take effect, and will get capped to this value.</description>
> > </property>
> > <property>
> > <name>yarn.resourcemanager.hostname</name>
> > <value>kfk-samza01</value>
> > </property>
> > </configuration>
> >
> > capacity-scheduler.xml
> > Alter value
> > <property>
> > <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
> > <value>0.5</value>
> > <description>
> > Maximum percent of resources in the cluster which can be used
> > to
> run
> > application masters i.e. controls number of concurrent running
> > applications.
> > </description>
> > </property>
> >
> > The jobs are configured to reduce the memory usage:
> >
> > yarn.container.memory.mb=256
> > yarn.am.container.memory.mb=256
> >
> > After introducing these changes I experienced a very appreciable
> > reduction of the speed. It seemed normal as the memory assigned to
> > the jobs was lowered and there were more running. It was running
> > until yesterday but today I am seeing that
> >
> > What I have seen today is that they are not moving from ACCEPTED to
> > RUNNING. I have found the following in the log (full log at the end):
> >
> > 2015-09-22 09:54:36,661 INFO [Container Monitor]
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408))
> > - Memory usage of ProcessTree 10346 for container-id
> > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical
> > memory used; 1.2 GB of 537.6 MB virtual memory used
> >
> > I am not sure where that 1.2 Gb comes from and makes the processes dye.
> >
> > Thanks,
> >
> > Jordi
> >
> >
> >
> >
> > 2015-09-22 09:54:36,519 INFO [Container Monitor]
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458))
> > - Removed ProcessTree with root 10271
> > 2015-09-22 09:54:36,519 INFO [AsyncDispatcher event handler]
> > container.Container (ContainerImpl.java:handle(999)) - Container
> > container_1442908447829_0002_01_000001 transitioned from RUNNING to
> > KILLING
> > 2015-09-22 09:54:36,533 INFO [AsyncDispatcher event handler]
> > launcher.ContainerLaunch
> > (ContainerLaunch.java:cleanupContainer(370))
> > - Cleaning up container container_1442908447829_0002_01_000001
> > 2015-09-22 09:54:36,661 INFO [Container Monitor]
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408))
> > - Memory usage of ProcessTree 10346 for container-id
> > container_1442908447829_0001_01_000001: 70.0 MB of 256 MB physical
> > memory used; 1.2 GB of 537.6 MB virtual memory used
> > 2015-09-22 09:54:36,661 WARN [Container Monitor]
> > monitor.ContainersMonitorImpl
> > (ContainersMonitorImpl.java:isProcessTreeOverLimit(293)) - Process
> > tree for
> > container: container_1442908447829_0001_01_000001 running over twice
> > the configured limit. Limit=563714432, current usage = 1269743616
> > 2015-09-22 09:54:36,662 WARN [Container Monitor]
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(447))
> > - Container
> > [pid=10346,containerID=container_1442908447829_0001_01_000001] is
> > running beyond virtual memory limits. Current usage: 70.0 MB of 256
> > MB physical memory used; 1.2 GB of 537.6 MB virtual memory used.
> > Killing
> container.
> > Dump of the process-tree for container_1442908447829_0001_01_000001 :
> > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> > |- 10346 10344 10346 10346 (java) 253 7 1269743616 17908
> > /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server
> > -Dsamza.container.name=samza-application-master
> > -Dsamza.log.dir=/opt/hadoop-2.6.0/logs/userlogs/application_14429084
> > 47
> > 829_0001/container_1442908447829_0001_01_000001
> > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcac
> > he
> > /application_1442908447829_0001/container_1442908447829_0001_01_0000
> > 01
> > /__package/tmp
> > -Xmx768M -XX:+PrintGCDateStamps
> > -Xloggc:/opt/hadoop-2.6.0/logs/userlogs/application_1442908447829_00
> > 01 /container_1442908447829_0001_01_000001/gc.log
> > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > -XX:GCLogFileSize=10241024 -d64 -cp
> > /opt/hadoop-2.6.0/conf:/tmp/hadoop-root/nm-local-dir/usercache/root/
> > ap
> > pcache/application_1442908447829_0001/container_1442908447829_0001_0
> > 1_
> > 000001/__package/lib/jackson-annotations-2.6.0.jar:/tmp/hadoop-root/
> > nm
> > -local-dir/usercache/root/appcache/application_1442908447829_0001/co
> > nt
> > ainer_1442908447829_0001_01_000001/__package/lib/jackson-core-2.6.0.
> > ja
> > r:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_
> > 14
> > 42908447829_0001/container_1442908447829_0001_01_000001/__package/li
> > b/
> > jackson-databind-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/r
> > oo
> > t/appcache/application_1442908447829_0001/container_1442908447829_00
> > 01
> > _01_000001/__package/lib/jackson-dataformat-smile-2.6.0.jar:/tmp/had
> > oo
> > p-root/nm-local-dir/usercache/root/appcache/application_144290844782
> > 9_
> > 0001/container_1442908447829_0001_01_000001/__package/lib/jackson-ja
> > xr
> > s-json-provider-2.6.0.jar:/tmp/hadoop-root/nm-local-dir/usercache/ro
> > ot
> > /appcache/application_1442908447829_0001/container_1442908447829_000
> > 1_
> > 01_000001/__package/lib/jackson-module-jaxb-annotations-2.6.0.jar:/t
> > mp
> > /hadoop-root/nm-local-dir/usercache/root/appcache/application_144290
> > 84
> > 47829_0001/container_1442908447829_0001_01_000001/__package/lib/nxtB
> > ro
> > ker-0.0.1.jar:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/
> > ap
> > plication_1442908447829_0001/container_1442908447829_0001_01_000001/
> > __ package/lib/nxtBroker-0.0.1-jar-with-dependencies.jar
> > org.apache.samza.job.yarn.SamzaAppMaster
> >
> > 2015-09-22 09:54:36,663 INFO [Container Monitor]
> > monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458))
> > - Removed ProcessTree with root 10346
> > 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler]
> > container.Container (ContainerImpl.java:handle(999)) - Container
> > container_1442908447829_0001_01_000001 transitioned from RUNNING to
> > KILLING
> > 2015-09-22 09:54:36,663 INFO [AsyncDispatcher event handler]
> > launcher.ContainerLaunch
> > (ContainerLaunch.java:cleanupContainer(370))
> > - Cleaning up container container_1442908447829_0001_01_000001
> > ________________________________
> > Jordi Blasi Uribarri
> > Área I+D+i
> >
> > [email protected]
> > Oficina Bilbao
> >
> > [http://www.nextel.es/wp-content/uploads/Firma_Nextel_2015.png]
> > ________________________________
> > Jordi Blasi Uribarri
> >
>