Re: Samza App Deployment Topology - Best Practices

Eric Shieh Wed, 16 Oct 2019 07:31:34 -0700

Thank you Bharath.  Regarding my 2nd question, perhaps the following
scenario can help to illustrate what I am looking to achieve:


Input stream A -> Job 1 -> Output stream B (Kafka Topic B)
Input stream A -> Job 2 -> Output stream C
Input stream D -> Job 3 -> Output stream B (Kafka Topic B)
Input stream B (Kafka Topic B) -> Elasticsearch (or write to HDFS)

In the case of "Input stream B (Kafka Topic B) -> Elasticsearch (or write
to HDFS)" this is what I was referring to as "Common/shared system
services" that does not have any transformation logic except message sink
to either Elasticsearch or HDFS using Samza's systems/connectors.  In other
words, Job 1 and Job 3 both output to "Output stream B" expecting messages
will be persisted in Elasticsearch or HDFS, would I need to specify the
system/connector configuration separately in Job 1 and Job 3?  Is there a
way to have "Input stream B  (Kafka Topic B) -> Elasticsearch (or write to
HDFS)" as its own stand-alone job so I can have the following:

RESTful web services (or other none Samza services/applications) as Kafka
producer ->  Input stream B (Kafka Topic B) -> Elasticsearch (or write to
HDFS)

Regards,

Eric

On Mon, Oct 14, 2019 at 8:35 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi Eric,
>
> Answers to your questions are as follows
>
>
> >
> >
> >
> > *Can I, or is it recommended to, package multiple jobs as 1
> deploymentwith
> > 1 properties file or keep each app separated?  Based on thedocumentation,
> > it appears to support 1 app/job within a singleconfiguration as there is
> no
> > mechanism to assign multiple app classes andgiven each a name unless I am
> > mistaken*
> >
>
>  *app.class* is a single valued configuration and your understanding about
> it based on the documentation is correct.
>
>
> >
> >
> > *If only 1 app per config+deployment, what is the best way to
> > handlerequirement #3 - common/shared system services as there is no app
> or
> > jobper say, I just need to specify the streams and output system
> > (ieorg.apache.samza.system.hdfs.writer*
> >
>
> There are couple of options to achieve your #3 requirement.
>
>    1. If there is enough commonality between your jobs, you could have one
>    application class that describes the logic and have the different
>    configurations to modify the behavior of the application logic. This
> does
>    come with some of the following considerations
>       1. Your deployment system needs to support deploying the same
>       application with different configs.
>       2. Potential duplication of configuration if you configuration system
>       doesn't support hierarchies and overrides.
>       3. Potentially unmanageable for evolution, since the change in
>       application affects multiple jobs and requires extensive testing
> across
>       different configurations.
>    2. You could potentially have libraries to perform some piece of
>    business logic and have your different jobs leverage them using
>    composition. Some things to consider with this option
>    1. Your application and configuration stay isolated.
>       2. You could still leverage some of the common configurations if your
>       configuration system supports hierarchies and overrides
>       3. Alleviates concerns over evolution and testing as long as the
>       changes are application specific.
>
>
> I am still unclear about the second part of your 2nd question.
> Do you mean to say all your jobs consume from same sources and write to
> sources and only your processing logic is different?
>
>
> > *common/shared system services as there is no app or jobper say, I just
> > need to specify the streams and output system*
>
>
> Also, I am not sure I follow what do you mean by "*there is no app or
> job"*.
> You still have 1 app per config + deployment, right?
>
> Thanks,
> Bharath
>
> On Thu, Oct 10, 2019 at 9:46 AM Eric Shieh <datosl...@gmail.com> wrote:
>
> > Hi,
> >
> > I am new to Samza, I am evaluating Samza as the backbone for my streaming
> > CEP requirement.  I have:
> >
> > 1. Multiple data enrichment and ETL jobs
> > 2. Multiple domain specific CEP rulesets
> > 3. Common/shared system services like consuming topics/streams and
> > persisting the messages in ElasticSearch and HDFS.
> >
> > My questions are:
> >
> > 1. Can I, or is it recommended to, package multiple jobs as 1 deployment
> > with 1 properties file or keep each app separated?  Based on the
> > documentation, it appears to support 1 app/job within a single
> > configuration as there is no mechanism to assign multiple app classes and
> > given each a name unless I am mistaken.
> > 2. If only 1 app per config+deployment, what is the best way to handle
> > requirement #3 - common/shared system services as there is no app or job
> > per say, I just need to specify the streams and output system (ie
> > org.apache.samza.system.hdfs.writer.
> > BinarySequenceFileHdfsWriter or
> >
> >
> org.apache.samza.system.elasticsearch.indexrequest.DefaultIndexRequestFactory).
> > Given it's a common shared system service not tied to specific jobs, can
> it
> > be deployed without an app?
> >
> > Thank you in advance for your help, looking forward to learning more
> about
> > Samza and developing this critical feature using Samza!
> >
> > Regards,
> >
> > Eric
> >
>

Re: Samza App Deployment Topology - Best Practices

Reply via email to