Hey, Cameron, Thanks for the detailed answers. It would be good to add this explanation to the SEP page as well.
Otherwise, +1 from my side. Thanks! -Yi On Mon, Mar 16, 2020 at 10:06 AM Cameron Lee <cameronlee...@gmail.com> wrote: > You have the correct understanding about the "yarn.resources.*" > configuration, and your question is a good one. Currently, the > implementation is that Samza will look in a specific place on the file > system (i.e. <current working directory>/__samzaFrameworkApi and <current > working directory>/__samzaFrameworkInfrastructure) to get the > API/infrastructure classpaths. I have a TODO in the code to make the file > system location configurable (or specified through an environment > variable). The configuration or environment variable for the file system > location would not be YARN-specific, and it would be applicable to any > execution environment. > > On Wed, Mar 11, 2020 at 10:54 PM Yi Pan <nickpa...@gmail.com> wrote: > > > OK. If I understand correctly, your answer is the following: > > yarn.resources.* configuration variables are used by YARN localizer to > make > > API and infrastructure classpath available, together with the > application's > > own classpath, which is also determined by the YARN localizer. > > The question here is: how do we let the container JVM know the > > API/infrastructure classpaths when launching the container processes? If > > the API and infrastructure classpaths (i.e. installation path determined > by > > the localizer) are customizable, then we would need to tell the container > > JVM those API/infra classpaths via some configuration variables as well, > > right? Hence, those configuration variable names need to be understood by > > the Samza application's code (which is run within the container) as well. > > If not, what's the mechanism that we will use to let the container JVM > > process to know where the YARN localizer has put API/infra classpaths? > > > > Thanks! > > > > -Yi > > > > > > > > On Wed, Mar 11, 2020 at 8:09 PM Cameron Lee <cameronlee...@gmail.com> > > wrote: > > > > > The configuration variables are only used by the YARN localizer. The > > Samza > > > application will look for the framework resources in certain places in > > the > > > application's working directory when it needs to access them. My aim is > > to > > > do something similar to how "yarn.package.path" works. In other > execution > > > environments, it is my understanding that "yarn.package.path" would get > > > replaced by a different environment-specific configuration key/value. > > > I agree that we should not use "yarn.resources.*" if the configurations > > are > > > not YARN-specific. Do you think that these resource localization > configs > > > are generalizable to arbitrary environments? If so, does that mean > > > "yarn.package.path" is also generalizable? For example, what if some > > > execution environment does not use URLs to specify resource locations > > > (although maybe this isn't a reasonable concern to worry about?)? > > > > > > Thanks, > > > Cameron > > > > > > On Wed, Mar 11, 2020 at 4:43 PM Yi Pan <nickpa...@gmail.com> wrote: > > > > > > > Hi, Cameron, > > > > > > > > Thanks for the quick responses! Appreciate it. > > > > > > > > I am still having a concern on a): are those configuration variables > > used > > > > by YARN localizer or by Samza applications? If those are used only by > > the > > > > YARN localizer, I agree that we should keep those as yarn specific. > > > > Otherwise, I think that would still be better to name those as > > > > cluster.based.resources.*. The reason being: Samza applications are > > > > supposed to be able to run on different execution environments. > > Ideally, > > > > when we are deploying the same Samza application on YARN vs Mesos or > > > > managed K8s clusters, we should only need to change the configure > > values, > > > > not the configuration variable names and values. Does it make sense? > > > > Otherwise, we can schedule a conf call to clarify that. > > > > > > > > Thanks! > > > > > > > > -Yi > > > > > > > > On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee <cameronlee...@gmail.com > > > > > > wrote: > > > > > > > > > a) The "yarn.resources.*" configs are for localizing the necessary > > > > > resources into the working directory for the process. I felt that > the > > > > > specific configuration format to specify these resources might be > > > > > YARN-specific (e.g. YARN has type and visibility configs for each > of > > > its > > > > > resources), so a generic format might not apply. In a non-YARN > case, > > > the > > > > > localization configs would need to be specified according to the > > > > technology > > > > > being used. > > > > > b) It is correct that the Avro version will need to be compatible > > with > > > > the > > > > > version that is used by the infrastructure, if infrastructure needs > > to > > > > use > > > > > Avro and pass the Avro object to the application. This is the case > > with > > > > any > > > > > serde technology that needs to be used. For the job coordinator, it > > is > > > > not > > > > > much of a concern anyways, since it is not doing serde of Avro > > > messages. > > > > > This may be more of a concern for general split deployment, which > > will > > > > > impact the processing containers, and will be a separate SEP. > > > > > c) It should work to leave infrastructure serdes in the > > infrastructure > > > > > classpath. The infrastructure serdes just see generic types (which > > are > > > > > java.lang.Object at runtime) for the messages, and they don't do > > > anything > > > > > with the concrete types, so in the infrastructure classes, the > > messages > > > > get > > > > > passed around as Object, but their concrete classes can still be > > loaded > > > > > from the application. As with (b), this is more of a concern for > > > general > > > > > split deployment, since the job coordinator doesn't do message > > serde. I > > > > > have run some tests regarding this classloading pattern, but we > will > > do > > > > > further verification for general split deployment. > > > > > d) Yes, you are correct. Good catch. It should be "described above > at > > > > > Application classloader". > > > > > > > > > > Thanks for all of your questions. I will clarify some details in > the > > > doc > > > > > regarding your questions. > > > > > > > > > > Cameron > > > > > > > > > > On Mon, Mar 9, 2020 at 12:07 PM Yi Pan <nickpa...@gmail.com> > wrote: > > > > > > > > > > > Hi, Cameron, > > > > > > > > > > > > Sorry to chime in late. Overall, looks great! I do have a few > > > > > > suggestions/questions before I can cast my vote here: > > > > > > a) for the configuration variable names, why are we limiting > > > ourselves > > > > to > > > > > > yarn.resource.*? We have changed some of the configuration > > variables > > > > from > > > > > > yarn specific to non-yarn specific. I would love to keep that > > > > consistent > > > > > > (i.e. gradually moving all our yarn-specific configuration > > variables > > > to > > > > > > non-yarn-specifc names) > > > > > > b) for the avro case as referred to in the delegation case in the > > > > > > Infrastructure classloader, if we delegate the object > > deserialization > > > > > class > > > > > > to the application classloader, would it be possible that the > > > > application > > > > > > provides an non-compatible version of avro class than the ones > used > > > > > within > > > > > > the "infrastructure plugins" and hence causing runtime exception > in > > > the > > > > > > infrastructure plugin? Or is the solution being: do not directly > > use > > > > > serde > > > > > > classes in the infrastructure code? > > > > > > c) following the description of infrastructure classloader flow, > > > where > > > > > > should we expect the serde classes? In the application > classpath, I > > > > > guess? > > > > > > So, does that mean that we should exclude serde classes > (including > > > > > > SerializableSerde and JsonSerdeV2) in the Samza infrastructure > > > package, > > > > > and > > > > > > tell the users to package them in application package? > > > > > > d) I am a bit confused about the description on "multiple" > > > application > > > > > > classloaders on the job coordinator: one is for the describe flow > > and > > > > the > > > > > > other is in the "Application" classloader, instead of "API" > > > > classloader, > > > > > > right? > > > > > > > > > > > > Best, > > > > > > > > > > > > -Yi > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 11:32 AM Ke Wu <ke.wu...@gmail.com> > wrote: > > > > > > > > > > > > > +1. > > > > > > > > > > > > > > Thanks for driving this effort. > > > > > > > > > > > > > > Best, > > > > > > > Ke > > > > > > > > > > > > > > > On Mar 3, 2020, at 6:28 PM, Jagadish Venkatraman < > > > > > > jagadish1...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > +1 binding. > > > > > > > > > > > > > > > > Thanks Cameron. I look forward to this feature taking our > > "Stream > > > > > > > > Processing as a service" offering to the next level. > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > On Tuesday, March 3, 2020, Prateek Maheshwari < > > > prate...@utexas.edu > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > >> +1 (binding) from me. Thanks for contributing this feature. > > > > Looking > > > > > > > forward > > > > > > > >> to having dependency isolation and to the ability to upgrade > > the > > > > > > > framework > > > > > > > >> independently from an application. > > > > > > > >> > > > > > > > >> Thanks, > > > > > > > >> Prateek > > > > > > > >> > > > > > > > >> On Fri, Feb 28, 2020 at 10:48 AM Cameron Lee < > > > > > cameronlee...@gmail.com > > > > > > > > > > > > > > >> wrote: > > > > > > > >> > > > > > > > >>> Hi all, > > > > > > > >>> > > > > > > > >>> This is a call for a vote on SEP-24: Cluster-based Job > > > > Coordinator > > > > > > > >>> Dependency Isolation. Thanks to everyone who reviewed the > > > > proposal > > > > > > and > > > > > > > >>> provided feedback. > > > > > > > >>> > > > > > > > >>> I have addressed comments on the SEP, and I am not aware of > > any > > > > > > further > > > > > > > >>> major questions or objections, so I am starting this vote. > > > > > > > >>> > > > > > > > >>> SEP link: > > > > > > > >>> > > > > > > > >>> https://cwiki.apache.org/confluence/display/SAMZA/SEP- > > > > > > > >> 24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation > > > > > > > >>> > > > > > > > >>> Discuss thread: > > > > > > > >>> > > > > > > > >>> > > > > https://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/% > > > > > > > >> 3cCAMja7KeGcRZ3H95Rxk5XE= > > > > > 60zxm6jxjkjuwwxmgmadpfbyk...@mail.gmail.com > > > > > > %3e > > > > > > > >>> There was also some discussion through comments on the SEP > > page > > > > > (see > > > > > > > >>> Resolved Comments). > > > > > > > >>> > > > > > > > >>> Please vote: > > > > > > > >>> [ ] +1 approve > > > > > > > >>> [ ] +0 no opinion > > > > > > > >>> [ ] -1 disapprove (and reason why) > > > > > > > >>> > > > > > > > >>> Thank you, > > > > > > > >>> Cameron > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Jagadish > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >