Hi Debraj,

Clarifying a bit on Yi’s response, since it was referring to the physical
Yarn container id..

If there are N Yarn containers, samza.container.ids are generated
sequentially from 0 to N-1. This ID is meant to be durable - ie., if a
particular container fails, the Samza AM will restart it with the same ID.

Having said that, you should just treat it as an opaque key that uniquely
identifies a container within a Samza job.

Could you share some details on how you intend to use this?

On Friday, April 23, 2021, Debraj Manna <subharaj.ma...@gmail.com> wrote:

> Thanks, Yi for replying.
>
> I also checked that class. But Container#getId().toString()
> returna s string like container_e02_1619095810959_0006_10_000004. But I am
> seeing samza.container.id is an integer like 0, 1 that is getting set as
> system var. Can you let me know how Container#getId().toString() is getting
> mapped to an integer?
>
> For example, below is the output of ps -ef for samza yarn container and I
> am seeing *-Dsamza.container.id <http://Dsamza.container.id>=2* &
> *-Dsamza.container.name
> <http://Dsamza.container.name>=samza-container-2*
>
> yarn      7706  7704  7 Apr22 ?        02:08:23
> /usr/lib/jvm/zulu-11-amd64/bin/java -Xmx8820M
> -XX:-OmitStackTraceInFastThrow -XX:NewRatio=8 -Xss256K
> -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/var/lib/heap-dumps/samzajobs
> -XX:NativeMemoryTracking=summary -Dio.netty.allocator.type=unpooled
> -Dio.grpc.netty.shaded.io.netty.allocator.type=unpooled -server
> *-Dsamza.container.id
> <http://Dsamza.container.id>=2 -Dsamza.container.name
> <http://Dsamza.container.name>=samza-container-2*
> -DisThreadContextMapInheritable=true
> -Dlog4j.configuration=file:/var/lib/hadoop-yarn/cache/
> yarn/nm-local-dir/usercache/ubuntu/appcache/application_
> 1619095810959_0006/container_e02_1619095810959_0006_10_
> 000004/__package/lib/log4j.xml
> -Dsamza.log.dir=/var/log/hadoop-yarn/containers/application_1619095810959_
> 0006/container_e02_1619095810959_0006_10_000004
> -Djava.io.tmpdir=/var/lib/hadoop-yarn/cache/yarn/nm-
> local-dir/usercache/ubuntu/appcache/application_
> 1619095810959_0006/container_e02_1619095810959_0006_10_
> 000004/__package/tmp
> -cp
> /etc/hadoop/conf::/var/lib/hadoop-yarn/cache/yarn/nm-
> local-dir/usercache/ubuntu/appcache/application_
> 1619095810959_0006/container_e02_1619095810959_0006_10_
> 000004/__package/lib/activation-1.1.jar:/var/lib/
> hadoop-yarn/cache/yarn/nm-local-dir/usercache/ubuntu/appcache/
>
>
> On Fri, Apr 23, 2021 at 11:09 AM Yi Pan <nickpa...@gmail.com> wrote:
>
> > Hi, Debraj,
> >
> > In YARN environment, Samza uses YARN generated containerIds as
> > environmental variables to set each container process's
> samza.container.id
> > .
> > i.e. when containers are requested by Samza AM process in YARN, YARN RM
> > will reply with a set of allocated container objects, which is of class
> > org.apache.hadoop.yarn.api.records.Container. That's the resource class
> to
> > uniquely identify a container in YARN and Container#getId().toString() is
> > the container ID string we set to samza.container.id.
> >
> > Best,
> >
> > -Yi
> >
> > On Wed, Apr 21, 2021 at 11:28 PM Debraj Manna <subharaj.ma...@gmail.com>
> > wrote:
> >
> > > The same has been asked in stackoverflow
> > > <
> > >
> > https://stackoverflow.com/questions/67207850/how-does-
> samza-generate-the-container-id-when-the-application-is-deployed-in-yar
> > > >
> > > also. Anyone any thoughts on this?
> > >
> > >
> > >
> > https://stackoverflow.com/questions/67207850/how-does-
> samza-generate-the-container-id-when-the-application-is-deployed-in-yar
> > >
> > > On Wed, Apr 21, 2021 at 6:08 PM Debraj Manna <subharaj.ma...@gmail.com
> >
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > Can someone let me know how is "samza.container.id" generated when a
> > > > samza app is running in yarn?
> > > >
> > > > Thanks,
> > > >
> > > >
> > >
> >
>


-- 
Jagadish

Reply via email to