Thank you Bharath. Regarding my 2nd question, perhaps the following scenario can help to illustrate what I am looking to achieve:
Input stream A -> Job 1 -> Output stream B (Kafka Topic B) Input stream A -> Job 2 -> Output stream C Input stream D -> Job 3 -> Output stream B (Kafka Topic B) Input stream B (Kafka Topic B) -> Elasticsearch (or write to HDFS) In the case of "Input stream B (Kafka Topic B) -> Elasticsearch (or write to HDFS)" this is what I was referring to as "Common/shared system services" that does not have any transformation logic except message sink to either Elasticsearch or HDFS using Samza's systems/connectors. In other words, Job 1 and Job 3 both output to "Output stream B" expecting messages will be persisted in Elasticsearch or HDFS, would I need to specify the system/connector configuration separately in Job 1 and Job 3? Is there a way to have "Input stream B (Kafka Topic B) -> Elasticsearch (or write to HDFS)" as its own stand-alone job so I can have the following: RESTful web services (or other none Samza services/applications) as Kafka producer -> Input stream B (Kafka Topic B) -> Elasticsearch (or write to HDFS) Regards, Eric On Mon, Oct 14, 2019 at 8:35 PM Bharath Kumara Subramanian < codin.mart...@gmail.com> wrote: > Hi Eric, > > Answers to your questions are as follows > > > > > > > > > > *Can I, or is it recommended to, package multiple jobs as 1 > deploymentwith > > 1 properties file or keep each app separated? Based on thedocumentation, > > it appears to support 1 app/job within a singleconfiguration as there is > no > > mechanism to assign multiple app classes andgiven each a name unless I am > > mistaken* > > > > *app.class* is a single valued configuration and your understanding about > it based on the documentation is correct. > > > > > > > > *If only 1 app per config+deployment, what is the best way to > > handlerequirement #3 - common/shared system services as there is no app > or > > jobper say, I just need to specify the streams and output system > > (ieorg.apache.samza.system.hdfs.writer* > > > > There are couple of options to achieve your #3 requirement. > > 1. If there is enough commonality between your jobs, you could have one > application class that describes the logic and have the different > configurations to modify the behavior of the application logic. This > does > come with some of the following considerations > 1. Your deployment system needs to support deploying the same > application with different configs. > 2. Potential duplication of configuration if you configuration system > doesn't support hierarchies and overrides. > 3. Potentially unmanageable for evolution, since the change in > application affects multiple jobs and requires extensive testing > across > different configurations. > 2. You could potentially have libraries to perform some piece of > business logic and have your different jobs leverage them using > composition. Some things to consider with this option > 1. Your application and configuration stay isolated. > 2. You could still leverage some of the common configurations if your > configuration system supports hierarchies and overrides > 3. Alleviates concerns over evolution and testing as long as the > changes are application specific. > > > I am still unclear about the second part of your 2nd question. > Do you mean to say all your jobs consume from same sources and write to > sources and only your processing logic is different? > > > > *common/shared system services as there is no app or jobper say, I just > > need to specify the streams and output system* > > > Also, I am not sure I follow what do you mean by "*there is no app or > job"*. > You still have 1 app per config + deployment, right? > > Thanks, > Bharath > > On Thu, Oct 10, 2019 at 9:46 AM Eric Shieh <datosl...@gmail.com> wrote: > > > Hi, > > > > I am new to Samza, I am evaluating Samza as the backbone for my streaming > > CEP requirement. I have: > > > > 1. Multiple data enrichment and ETL jobs > > 2. Multiple domain specific CEP rulesets > > 3. Common/shared system services like consuming topics/streams and > > persisting the messages in ElasticSearch and HDFS. > > > > My questions are: > > > > 1. Can I, or is it recommended to, package multiple jobs as 1 deployment > > with 1 properties file or keep each app separated? Based on the > > documentation, it appears to support 1 app/job within a single > > configuration as there is no mechanism to assign multiple app classes and > > given each a name unless I am mistaken. > > 2. If only 1 app per config+deployment, what is the best way to handle > > requirement #3 - common/shared system services as there is no app or job > > per say, I just need to specify the streams and output system (ie > > org.apache.samza.system.hdfs.writer. > > BinarySequenceFileHdfsWriter or > > > > > org.apache.samza.system.elasticsearch.indexrequest.DefaultIndexRequestFactory). > > Given it's a common shared system service not tied to specific jobs, can > it > > be deployed without an app? > > > > Thank you in advance for your help, looking forward to learning more > about > > Samza and developing this critical feature using Samza! > > > > Regards, > > > > Eric > > >