Hey, Cameron,

Thanks for the detailed answers. It would be good to add this explanation
to the SEP page as well.

Otherwise, +1 from my side. Thanks!

-Yi

On Mon, Mar 16, 2020 at 10:06 AM Cameron Lee <cameronlee...@gmail.com>
wrote:

> You have the correct understanding about the "yarn.resources.*"
> configuration, and your question is a good one. Currently, the
> implementation is that Samza will look in a specific place on the file
> system (i.e. <current working directory>/__samzaFrameworkApi and <current
> working directory>/__samzaFrameworkInfrastructure) to get the
> API/infrastructure classpaths. I have a TODO in the code to make the file
> system location configurable (or specified through an environment
> variable). The configuration or environment variable for the file system
> location would not be YARN-specific, and it would be applicable to any
> execution environment.
>
> On Wed, Mar 11, 2020 at 10:54 PM Yi Pan <nickpa...@gmail.com> wrote:
>
> > OK. If I understand correctly, your answer is the following:
> > yarn.resources.* configuration variables are used by YARN localizer to
> make
> > API and infrastructure classpath available, together with the
> application's
> > own classpath, which is also determined by the YARN localizer.
> > The question here is: how do we let the container JVM know the
> > API/infrastructure classpaths when launching the container processes? If
> > the API and infrastructure classpaths (i.e. installation path determined
> by
> > the localizer) are customizable, then we would need to tell the container
> > JVM those API/infra classpaths via some configuration variables as well,
> > right? Hence, those configuration variable names need to be understood by
> > the Samza application's code (which is run within the container) as well.
> > If not, what's the mechanism that we will use to let the container JVM
> > process to know where the YARN localizer has put API/infra classpaths?
> >
> > Thanks!
> >
> > -Yi
> >
> >
> >
> > On Wed, Mar 11, 2020 at 8:09 PM Cameron Lee <cameronlee...@gmail.com>
> > wrote:
> >
> > > The configuration variables are only used by the YARN localizer. The
> > Samza
> > > application will look for the framework resources in certain places in
> > the
> > > application's working directory when it needs to access them. My aim is
> > to
> > > do something similar to how "yarn.package.path" works. In other
> execution
> > > environments, it is my understanding that "yarn.package.path" would get
> > > replaced by a different environment-specific configuration key/value.
> > > I agree that we should not use "yarn.resources.*" if the configurations
> > are
> > > not YARN-specific. Do you think that these resource localization
> configs
> > > are generalizable to arbitrary environments? If so, does that mean
> > > "yarn.package.path" is also generalizable? For example, what if some
> > > execution environment does not use URLs to specify resource locations
> > > (although maybe this isn't a reasonable concern to worry about?)?
> > >
> > > Thanks,
> > > Cameron
> > >
> > > On Wed, Mar 11, 2020 at 4:43 PM Yi Pan <nickpa...@gmail.com> wrote:
> > >
> > > > Hi, Cameron,
> > > >
> > > > Thanks for the quick responses! Appreciate it.
> > > >
> > > > I am still having a concern on a): are those configuration variables
> > used
> > > > by YARN localizer or by Samza applications? If those are used only by
> > the
> > > > YARN localizer, I agree that we should keep those as yarn specific.
> > > > Otherwise, I think that would still be better to name those as
> > > > cluster.based.resources.*. The reason being: Samza applications are
> > > > supposed to be able to run on different execution environments.
> > Ideally,
> > > > when we are deploying the same Samza application on YARN vs Mesos or
> > > > managed K8s clusters, we should only need to change the configure
> > values,
> > > > not the configuration variable names and values. Does it make sense?
> > > > Otherwise, we can schedule a conf call to clarify that.
> > > >
> > > > Thanks!
> > > >
> > > > -Yi
> > > >
> > > > On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee <cameronlee...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > a) The "yarn.resources.*" configs are for localizing the necessary
> > > > > resources into the working directory for the process. I felt that
> the
> > > > > specific configuration format to specify these resources might be
> > > > > YARN-specific (e.g. YARN has type and visibility configs for each
> of
> > > its
> > > > > resources), so a generic format might not apply. In a non-YARN
> case,
> > > the
> > > > > localization configs would need to be specified according to the
> > > > technology
> > > > > being used.
> > > > > b) It is correct that the Avro version will need to be compatible
> > with
> > > > the
> > > > > version that is used by the infrastructure, if infrastructure needs
> > to
> > > > use
> > > > > Avro and pass the Avro object to the application. This is the case
> > with
> > > > any
> > > > > serde technology that needs to be used. For the job coordinator, it
> > is
> > > > not
> > > > > much of a concern anyways, since it is not doing serde of Avro
> > > messages.
> > > > > This may be more of a concern for general split deployment, which
> > will
> > > > > impact the processing containers, and will be a separate SEP.
> > > > > c) It should work to leave infrastructure serdes in the
> > infrastructure
> > > > > classpath. The infrastructure serdes just see generic types (which
> > are
> > > > > java.lang.Object at runtime) for the messages, and they don't do
> > > anything
> > > > > with the concrete types, so in the infrastructure classes, the
> > messages
> > > > get
> > > > > passed around as Object, but their concrete classes can still be
> > loaded
> > > > > from the application. As with (b), this is more of a concern for
> > > general
> > > > > split deployment, since the job coordinator doesn't do message
> > serde. I
> > > > > have run some tests regarding this classloading pattern, but we
> will
> > do
> > > > > further verification for general split deployment.
> > > > > d) Yes, you are correct. Good catch. It should be "described above
> at
> > > > > Application classloader".
> > > > >
> > > > > Thanks for all of your questions. I will clarify some details in
> the
> > > doc
> > > > > regarding your questions.
> > > > >
> > > > > Cameron
> > > > >
> > > > > On Mon, Mar 9, 2020 at 12:07 PM Yi Pan <nickpa...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi, Cameron,
> > > > > >
> > > > > > Sorry to chime in late. Overall, looks great! I do have a few
> > > > > > suggestions/questions before I can cast my vote here:
> > > > > > a) for the configuration variable names, why are we limiting
> > > ourselves
> > > > to
> > > > > > yarn.resource.*? We have changed some of the configuration
> > variables
> > > > from
> > > > > > yarn specific to non-yarn specific. I would love to keep that
> > > > consistent
> > > > > > (i.e. gradually moving all our yarn-specific configuration
> > variables
> > > to
> > > > > > non-yarn-specifc names)
> > > > > > b) for the avro case as referred to in the delegation case in the
> > > > > > Infrastructure classloader, if we delegate the object
> > deserialization
> > > > > class
> > > > > > to the application classloader, would it be possible that the
> > > > application
> > > > > > provides an non-compatible version of avro class than the ones
> used
> > > > > within
> > > > > > the "infrastructure plugins" and hence causing runtime exception
> in
> > > the
> > > > > > infrastructure plugin? Or is the solution being: do not directly
> > use
> > > > > serde
> > > > > > classes in the infrastructure code?
> > > > > > c) following the description of infrastructure classloader flow,
> > > where
> > > > > > should we expect the serde classes? In the application
> classpath, I
> > > > > guess?
> > > > > > So, does that mean that we should exclude serde classes
> (including
> > > > > > SerializableSerde and JsonSerdeV2) in the Samza infrastructure
> > > package,
> > > > > and
> > > > > > tell the users to package them in application package?
> > > > > > d) I am a bit confused about the description on "multiple"
> > > application
> > > > > > classloaders on the job coordinator: one is for the describe flow
> > and
> > > > the
> > > > > > other is in the "Application" classloader, instead of "API"
> > > > classloader,
> > > > > > right?
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > -Yi
> > > > > >
> > > > > >
> > > > > > On Wed, Mar 4, 2020 at 11:32 AM Ke Wu <ke.wu...@gmail.com>
> wrote:
> > > > > >
> > > > > > > +1.
> > > > > > >
> > > > > > > Thanks for driving this effort.
> > > > > > >
> > > > > > > Best,
> > > > > > > Ke
> > > > > > >
> > > > > > > > On Mar 3, 2020, at 6:28 PM, Jagadish Venkatraman <
> > > > > > jagadish1...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > +1 binding.
> > > > > > > >
> > > > > > > > Thanks Cameron. I look forward to this feature taking our
> > "Stream
> > > > > > > > Processing as a service" offering to the next level.
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Tuesday, March 3, 2020, Prateek Maheshwari <
> > > prate...@utexas.edu
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> +1 (binding) from me. Thanks for contributing this feature.
> > > > Looking
> > > > > > > forward
> > > > > > > >> to having dependency isolation and to the ability to upgrade
> > the
> > > > > > > framework
> > > > > > > >> independently from an application.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Prateek
> > > > > > > >>
> > > > > > > >> On Fri, Feb 28, 2020 at 10:48 AM Cameron Lee <
> > > > > cameronlee...@gmail.com
> > > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> Hi all,
> > > > > > > >>>
> > > > > > > >>> This is a call for a vote on SEP-24: Cluster-based Job
> > > > Coordinator
> > > > > > > >>> Dependency Isolation. Thanks to everyone who reviewed the
> > > > proposal
> > > > > > and
> > > > > > > >>> provided feedback.
> > > > > > > >>>
> > > > > > > >>> I have addressed comments on the SEP, and I am not aware of
> > any
> > > > > > further
> > > > > > > >>> major questions or objections, so I am starting this vote.
> > > > > > > >>>
> > > > > > > >>> SEP link:
> > > > > > > >>>
> > > > > > > >>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > > > >> 24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation
> > > > > > > >>>
> > > > > > > >>> Discuss thread:
> > > > > > > >>>
> > > > > > > >>>
> > > > https://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%
> > > > > > > >> 3cCAMja7KeGcRZ3H95Rxk5XE=
> > > > > 60zxm6jxjkjuwwxmgmadpfbyk...@mail.gmail.com
> > > > > > %3e
> > > > > > > >>> There was also some discussion through comments on the SEP
> > page
> > > > > (see
> > > > > > > >>> Resolved Comments).
> > > > > > > >>>
> > > > > > > >>> Please vote:
> > > > > > > >>> [ ] +1 approve
> > > > > > > >>> [ ] +0 no opinion
> > > > > > > >>> [ ] -1 disapprove (and reason why)
> > > > > > > >>>
> > > > > > > >>> Thank you,
> > > > > > > >>> Cameron
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Jagadish
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to