Re: Samza as a Caching layer

Chris Riccomini Tue, 02 Sep 2014 14:21:11 -0700

Hey Shekar,

Ah ha. In that case, do you expect your SamzaContainer to try to connect
to the RM at 127.0.0.1, or do you expect it to try to connect to some
remote RM? If you expect it to try and connect to a remote RM, it's not
doing that. Perhaps because YARN_HOME isn't set.


If you go to your RM's web interface, how many active nodes do you see
listed?

Cheers,
Chris

On 9/2/14 2:17 PM, "Shekar Tippur" <[email protected]> wrote:

>Chris ..
>
>I am using a rhel server to host all the components (Yarn, kafka, samza).
>I
>dont have ACLs open to wikipedia.
>I am following ..
>http://samza.incubator.apache.org/learn/tutorials/latest/run-hello-samza-w
>ithout-internet.html
>
>- Shekar
>
>
>
>On Tue, Sep 2, 2014 at 2:13 PM, Chris Riccomini <
>[email protected]> wrote:
>
>> Hey Shekar,
>>
>> Can you try changing that to:
>>
>>   http://127.0.0.1:8088/cluster
>>
>>
>> And see if you can connect?
>>
>> Cheers,
>> Chris
>>
>> On 9/2/14 1:21 PM, "Shekar Tippur" <[email protected]> wrote:
>>
>> >Other observation is ..
>> >
>> >http://10.132.62.185:8088/cluster shows that no applications are
>>running.
>> >
>> >- Shekar
>> >
>> >
>> >
>> >
>> >On Tue, Sep 2, 2014 at 1:15 PM, Shekar Tippur <[email protected]>
>>wrote:
>> >
>> >> Yarn seem to be running ..
>> >>
>> >> yarn      5462  0.0  2.0 1641296 161404 ?      Sl   Jun20  95:26
>> >> /usr/java/jdk1.6.0_31/bin/java -Dproc_resourcemanager -Xmx1000m
>> >> -Dhadoop.log.dir=/var/log/hadoop-yarn
>> >>-Dyarn.log.dir=/var/log/hadoop-yarn
>> >> -Dhadoop.log.file=yarn-yarn-resourcemanager-pppdc9prd2y2.log
>> >> -Dyarn.log.file=yarn-yarn-resourcemanager-pppdc9prd2y2.log
>> >> -Dyarn.home.dir=/usr/lib/hadoop-yarn
>> >>-Dhadoop.home.dir=/usr/lib/hadoop-yarn
>> >> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA
>> >> -Djava.library.path=/usr/lib/hadoop/lib/native -classpath
>> >>
>> 
>>>>/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/
>>>>*:
>> 
>>>>/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*
>>>>:/
>> 
>>>>usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yar
>>>>n/
>> 
>>>>.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/us
>>>>r/
>> 
>>>>lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*:/etc/hadoop/conf/rm-con
>>>>fi
>> >>g/log4j.properties
>> >> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
>> >>
>> >> I can tail kafka topic as well ..
>> >>
>> >> deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper
>>localhost:2181
>> >>--topic wikipedia-raw
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Sep 2, 2014 at 1:04 PM, Chris Riccomini <
>> >> [email protected]> wrote:
>> >>
>> >>> Hey Shekar,
>> >>>
>> >>> It looks like your job is hanging trying to connect to the RM on
>>your
>> >>> localhost. I thought that you said that your job was running in
>>local
>> >>> mode. If so, it should be using the LocalJobFactory. If not, and you
>> >>> intend to run on YARN, is your YARN RM up and running on localhost?
>> >>>
>> >>> Cheers,
>> >>> Chris
>> >>>
>> >>> On 9/2/14 12:22 PM, "Shekar Tippur" <[email protected]> wrote:
>> >>>
>> >>> >Chris ..
>> >>> >
>> >>> >$ cat ./deploy/samza/undefined-samza-container-name.log
>> >>> >
>> >>> >2014-09-02 11:17:58 JobRunner [INFO] job factory:
>> >>> >org.apache.samza.job.yarn.YarnJobFactory
>> >>> >
>> >>> >2014-09-02 11:17:59 ClientHelper [INFO] trying to connect to RM
>> >>> >127.0.0.1:8032
>> >>> >
>> >>> >2014-09-02 11:17:59 NativeCodeLoader [WARN] Unable to load
>> >>>native-hadoop
>> >>> >library for your platform... using builtin-java classes where
>> >>>applicable
>> >>> >
>> >>> >2014-09-02 11:17:59 RMProxy [INFO] Connecting to ResourceManager
>>at /
>> >>> >127.0.0.1:8032
>> >>> >
>> >>> >
>> >>> >and Log4j ..
>> >>> >
>> >>> ><!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
>> >>> >
>> >>> ><log4j:configuration
>>xmlns:log4j="http://jakarta.apache.org/log4j/";>
>> >>> >
>> >>> >  <appender name="RollingAppender"
>> >>> >class="org.apache.log4j.DailyRollingFileAppender">
>> >>> >
>> >>> >     <param name="File"
>> >>> >value="${samza.log.dir}/${samza.container.name}.log"
>> >>> >/>
>> >>> >
>> >>> >     <param name="DatePattern" value="'.'yyyy-MM-dd" />
>> >>> >
>> >>> >     <layout class="org.apache.log4j.PatternLayout">
>> >>> >
>> >>> >      <param name="ConversionPattern" value="%d{yyyy-MM-dd
>>HH:mm:ss}
>> >>> %c{1}
>> >>> >[%p] %m%n" />
>> >>> >
>> >>> >     </layout>
>> >>> >
>> >>> >  </appender>
>> >>> >
>> >>> >  <root>
>> >>> >
>> >>> >    <priority value="info" />
>> >>> >
>> >>> >    <appender-ref ref="RollingAppender"/>
>> >>> >
>> >>> >  </root>
>> >>> >
>> >>> ></log4j:configuration>
>> >>> >
>> >>> >
>> >>> >On Tue, Sep 2, 2014 at 12:02 PM, Chris Riccomini <
>> >>> >[email protected]> wrote:
>> >>> >
>> >>> >> Hey Shekar,
>> >>> >>
>> >>> >> Can you attach your log files? I'm wondering if it's a
>> >>>mis-configured
>> >>> >> log4j.xml (or missing slf4j-log4j jar), which is leading to
>>nearly
>> >>> empty
>> >>> >> log files. Also, I'm wondering if the job starts fully. Anything
>>you
>> >>> can
>> >>> >> attach would be helpful.
>> >>> >>
>> >>> >> Cheers,
>> >>> >> Chris
>> >>> >>
>> >>> >> On 9/2/14 11:43 AM, "Shekar Tippur" <[email protected]> wrote:
>> >>> >>
>> >>> >> >I am running in local mode.
>> >>> >> >
>> >>> >> >S
>> >>> >> >On Sep 2, 2014 11:42 AM, "Yan Fang" <[email protected]>
>>wrote:
>> >>> >> >
>> >>> >> >> Hi Shekar.
>> >>> >> >>
>> >>> >> >> Are you running job in local mode or yarn mode? If yarn mode,
>>the
>> >>> log
>> >>> >> >>is in
>> >>> >> >> the yarn's container log.
>> >>> >> >>
>> >>> >> >> Thanks,
>> >>> >> >>
>> >>> >> >> Fang, Yan
>> >>> >> >> [email protected]
>> >>> >> >> +1 (206) 849-4108
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On Tue, Sep 2, 2014 at 11:36 AM, Shekar Tippur
>> >>><[email protected]>
>> >>> >> >>wrote:
>> >>> >> >>
>> >>> >> >> > Chris,
>> >>> >> >> >
>> >>> >> >> > Got some time to play around a bit more.
>> >>> >> >> > I tried to edit
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >>
>> >>> >>
>> >>>
>> >>>
>> 
>>>>>>>>>samza-wikipedia/src/main/java/samza/examples/wikipedia/task/Wikipe
>>>>>>>>>di
>> >>>>>>>aFe
>> >>> >>>>ed
>> >>> >> >>StreamTask.java
>> >>> >> >> > to add a logger info statement to tap the incoming message.
>>I
>> >>>dont
>> >>> >>see
>> >>> >> >> the
>> >>> >> >> > messages being printed to the log file.
>> >>> >> >> >
>> >>> >> >> > Is this the right place to start?
>> >>> >> >> >
>> >>> >> >> > public class WikipediaFeedStreamTask implements StreamTask {
>> >>> >> >> >
>> >>> >> >> >   private static final SystemStream OUTPUT_STREAM = new
>> >>> >> >> > SystemStream("kafka",
>> >>> >> >> > "wikipedia-raw");
>> >>> >> >> >
>> >>> >> >> >   private static final Logger log = LoggerFactory.getLogger
>> >>> >> >> > (WikipediaFeedStreamTask.class);
>> >>> >> >> >
>> >>> >> >> >   @Override
>> >>> >> >> >
>> >>> >> >> >   public void process(IncomingMessageEnvelope envelope,
>> >>> >> >>MessageCollector
>> >>> >> >> > collector, TaskCoordinator coordinator) {
>> >>> >> >> >
>> >>> >> >> >     Map<String, Object> outgoingMap =
>> >>> >> >> > WikipediaFeedEvent.toMap((WikipediaFeedEvent)
>> >>> >>envelope.getMessage());
>> >>> >> >> >
>> >>> >> >> >     log.info(envelope.getMessage().toString());
>> >>> >> >> >
>> >>> >> >> >     collector.send(new
>>OutgoingMessageEnvelope(OUTPUT_STREAM,
>> >>> >> >> > outgoingMap));
>> >>> >> >> >
>> >>> >> >> >   }
>> >>> >> >> >
>> >>> >> >> > }
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > On Mon, Aug 25, 2014 at 9:01 AM, Chris Riccomini <
>> >>> >> >> > [email protected]> wrote:
>> >>> >> >> >
>> >>> >> >> > > Hey Shekar,
>> >>> >> >> > >
>> >>> >> >> > > Your thought process is on the right track. It's probably
>> >>>best
>> >>> to
>> >>> >> >>start
>> >>> >> >> > > with hello-samza, and modify it to get what you want. To
>> >>>start
>> >>> >>with,
>> >>> >> >> > > you'll want to:
>> >>> >> >> > >
>> >>> >> >> > > 1. Write a simple StreamTask that just does something
>>silly
>> >>>like
>> >>> >> >>just
>> >>> >> >> > > print messages that it receives.
>> >>> >> >> > > 2. Write a configuration for the job that consumes from
>>just
>> >>>the
>> >>> >> >>stream
>> >>> >> >> > > (alerts from different sources).
>> >>> >> >> > > 3. Run this to make sure you've got it working.
>> >>> >> >> > > 4. Now add your table join. This can be either a
>>change-data
>> >>> >>capture
>> >>> >> >> > (CDC)
>> >>> >> >> > > stream, or via a remote DB call.
>> >>> >> >> > >
>> >>> >> >> > > That should get you to a point where you've got your job
>>up
>> >>>and
>> >>> >> >> running.
>> >>> >> >> > > From there, you could create your own Maven project, and
>> >>>migrate
>> >>> >> >>your
>> >>> >> >> > code
>> >>> >> >> > > over accordingly.
>> >>> >> >> > >
>> >>> >> >> > > Cheers,
>> >>> >> >> > > Chris
>> >>> >> >> > >
>> >>> >> >> > > On 8/24/14 1:42 AM, "Shekar Tippur" <[email protected]>
>> >>>wrote:
>> >>> >> >> > >
>> >>> >> >> > > >Chris,
>> >>> >> >> > > >
>> >>> >> >> > > >I have gone thro the documentation and decided that the
>> >>>option
>> >>> >> >>that is
>> >>> >> >> > > >most
>> >>> >> >> > > >suitable for me is stream-table.
>> >>> >> >> > > >
>> >>> >> >> > > >I see the following things:
>> >>> >> >> > > >
>> >>> >> >> > > >1. Point samza to a table (database)
>> >>> >> >> > > >2. Point Samza to a stream - Alert stream from different
>> >>> sources
>> >>> >> >> > > >3. Join key like a hostname
>> >>> >> >> > > >
>> >>> >> >> > > >I have Hello Samza working. To extend that to do what my
>> >>>needs
>> >>> >> >>are, I
>> >>> >> >> am
>> >>> >> >> > > >not sure where to start (Needs more code change OR
>> >>> configuration
>> >>> >> >> changes
>> >>> >> >> > > >OR
>> >>> >> >> > > >both)?
>> >>> >> >> > > >
>> >>> >> >> > > >I have gone thro
>> >>> >> >> > > >
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >>
>> >>> >>
>> >>> >>
>> >>>
>> >>>
>> http://samza.incubator.apache.org/learn/documentation/latest/api/overvie
>> >>>w
>> >>> >> >> > > .
>> >>> >> >> > > >html
>> >>> >> >> > > >
>> >>> >> >> > > >Is my thought process on the right track? Can you please
>> >>>point
>> >>> >>me
>> >>> >> >>to
>> >>> >> >> the
>> >>> >> >> > > >right direction?
>> >>> >> >> > > >
>> >>> >> >> > > >- Shekar
>> >>> >> >> > > >
>> >>> >> >> > > >
>> >>> >> >> > > >
>> >>> >> >> > > >On Thu, Aug 21, 2014 at 1:08 PM, Shekar Tippur
>> >>> >><[email protected]>
>> >>> >> >> > wrote:
>> >>> >> >> > > >
>> >>> >> >> > > >> Chris,
>> >>> >> >> > > >>
>> >>> >> >> > > >> This is perfectly good answer. I will start poking more
>> >>>into
>> >>> >> >>option
>> >>> >> >> > #4.
>> >>> >> >> > > >>
>> >>> >> >> > > >> - Shekar
>> >>> >> >> > > >>
>> >>> >> >> > > >>
>> >>> >> >> > > >> On Thu, Aug 21, 2014 at 1:05 PM, Chris Riccomini <
>> >>> >> >> > > >> [email protected]> wrote:
>> >>> >> >> > > >>
>> >>> >> >> > > >>> Hey Shekar,
>> >>> >> >> > > >>>
>> >>> >> >> > > >>> Your two options are really (3) or (4), then. You can
>> >>>either
>> >>> >>run
>> >>> >> >> some
>> >>> >> >> > > >>> external DB that holds the data set, and you can
>>query it
>> >>> >>from a
>> >>> >> >> > > >>> StreamTask, or you can use Samza's state store
>>feature to
>> >>> >>push
>> >>> >> >>data
>> >>> >> >> > > >>>into a
>> >>> >> >> > > >>> stream that you can then store in a partitioned
>>key-value
>> >>> >>store
>> >>> >> >> along
>> >>> >> >> > > >>>with
>> >>> >> >> > > >>> your StreamTasks. There is some documentation here
>>about
>> >>>the
>> >>> >> >>state
>> >>> >> >> > > >>>store
>> >>> >> >> > > >>> approach:
>> >>> >> >> > > >>>
>> >>> >> >> > > >>>
>> >>> >> >> > > >>>
>> >>> >> >> > > >>>
>> >>> >> >> > >
>> >>> >> >>
>> >>> >>
>> >>>
>> >>>
>> http://samza.incubator.apache.org/learn/documentation/0.7.0/container/st
>> >>> >> >> > > >>>ate
>> >>> >> >> > > >>> -management.html
>> >>> >> >> > > >>>
>> >>> >> >> > > >>><
>> >>> >> >> > >
>> >>> >> >>
>> >>>
>> >>>>>
>> http://samza.incubator.apache.org/learn/documentation/0.7.0/container/
>> >>>>>s
>> >>> >> >> > > >>>tate-management.html>
>> >>> >> >> > > >>>
>> >>> >> >> > > >>>
>> >>> >> >> > > >>> (4) is going to require more up front effort from you,
>> >>>since
>> >>> >> >>you'll
>> >>> >> >> > > >>>have
>> >>> >> >> > > >>> to understand how Kafka's partitioning model works,
>>and
>> >>> setup
>> >>> >> >>some
>> >>> >> >> > > >>> pipeline to push the updates for your state. In the
>>long
>> >>> >>run, I
>> >>> >> >> > believe
>> >>> >> >> > > >>> it's the better approach, though. Local lookups on a
>> >>> >>key-value
>> >>> >> >> store
>> >>> >> >> > > >>> should be faster than doing remote RPC calls to a DB
>>for
>> >>> >>every
>> >>> >> >> > message.
>> >>> >> >> > > >>>
>> >>> >> >> > > >>> I'm sorry I can't give you a more definitive answer.
>>It's
>> >>> >>really
>> >>> >> >> > about
>> >>> >> >> > > >>> trade-offs.
>> >>> >> >> > > >>>
>> >>> >> >> > > >>> Cheers,
>> >>> >> >> > > >>> Chris
>> >>> >> >> > > >>>
>> >>> >> >> > > >>> On 8/21/14 12:22 PM, "Shekar Tippur"
>><[email protected]>
>> >>> >>wrote:
>> >>> >> >> > > >>>
>> >>> >> >> > > >>> >Chris,
>> >>> >> >> > > >>> >
>> >>> >> >> > > >>> >A big thanks for a swift response. The data set is
>>huge
>> >>>and
>> >>> >>the
>> >>> >> >> > > >>>frequency
>> >>> >> >> > > >>> >is in burst.
>> >>> >> >> > > >>> >What do you suggest?
>> >>> >> >> > > >>> >
>> >>> >> >> > > >>> >- Shekar
>> >>> >> >> > > >>> >
>> >>> >> >> > > >>> >
>> >>> >> >> > > >>> >On Thu, Aug 21, 2014 at 11:05 AM, Chris Riccomini <
>> >>> >> >> > > >>> >[email protected]> wrote:
>> >>> >> >> > > >>> >
>> >>> >> >> > > >>> >> Hey Shekar,
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >> This is feasible, and you are on the right thought
>> >>> >>process.
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >> For the sake of discussion, I'm going to pretend
>>that
>> >>>you
>> >>> >> >>have a
>> >>> >> >> > > >>>Kafka
>> >>> >> >> > > >>> >> topic called "PageViewEvent", which has just the IP
>> >>> >>address
>> >>> >> >>that
>> >>> >> >> > was
>> >>> >> >> > > >>> >>used
>> >>> >> >> > > >>> >> to view a page. These messages will be logged every
>> >>>time
>> >>> a
>> >>> >> >>page
>> >>> >> >> > view
>> >>> >> >> > > >>> >> happens. I'm also going to pretend that you have
>>some
>> >>> >>state
>> >>> >> >> called
>> >>> >> >> > > >>> >>"IPGeo"
>> >>> >> >> > > >>> >> (e.g. The maxmind data set). In this example, we'll
>> >>>want
>> >>> >>to
>> >>> >> >>join
>> >>> >> >> > the
>> >>> >> >> > > >>> >> long/lat geo information from IPGeo to the
>> >>>PageViewEvent,
>> >>> >>and
>> >>> >> >> send
>> >>> >> >> > > >>>it
>> >>> >> >> > > >>> >>to a
>> >>> >> >> > > >>> >> new topic: PageViewEventsWithGeo.
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >> You have several options on how to implement this
>> >>> example.
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >> 1. If your joining data set (IPGeo) is relatively
>> >>>small
>> >>> >>and
>> >>> >> >> > changes
>> >>> >> >> > > >>> >> infrequently, you can just pack it up in your jar
>>or
>> >>>.tgz
>> >>> >> >>file,
>> >>> >> >> > and
>> >>> >> >> > > >>> open
>> >>> >> >> > > >>> >> it open in every StreamTask.
>> >>> >> >> > > >>> >> 2. If your data set is small, but changes somewhat
>> >>> >> >>frequently,
>> >>> >> >> you
>> >>> >> >> > > >>>can
>> >>> >> >> > > >>> >> throw the data set on some HTTP/HDFS/S3 server
>> >>>somewhere,
>> >>> >>and
>> >>> >> >> have
>> >>> >> >> > > >>>your
>> >>> >> >> > > >>> >> StreamTask refresh it periodically by
>>re-downloading
>> >>>it.
>> >>> >> >> > > >>> >> 3. You can do remote RPC calls for the IPGeo data
>>on
>> >>> every
>> >>> >> >>page
>> >>> >> >> > view
>> >>> >> >> > > >>> >>event
>> >>> >> >> > > >>> >> by query some remote service or DB (e.g.
>>Cassandra).
>> >>> >> >> > > >>> >> 4. You can use Samza's state feature to set your
>>IPGeo
>> >>> >>data
>> >>> >> >>as a
>> >>> >> >> > > >>>series
>> >>> >> >> > > >>> >>of
>> >>> >> >> > > >>> >> messages to a log-compacted Kafka topic
>> >>> >> >> > > >>> >> (
>> >>> >> >> 
>>https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction
>> >>> >> >> > ),
>> >>> >> >> > > >>> and
>> >>> >> >> > > >>> >> configure your Samza job to read this topic as a
>> >>> bootstrap
>> >>> >> >> stream
>> >>> >> >> > > >>> >> (
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>>
>> >>> >> >> > > >>>
>> >>> >> >> > >
>> >>> >> >>
>> >>> >>
>> >>>
>> >>>
>> http://samza.incubator.apache.org/learn/documentation/0.7.0/container/st
>> >>> >> >> > > >>>r
>> >>> >> >> > > >>> >>e
>> >>> >> >> > > >>> >> ams.html).
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >> For (4), you'd have to partition the IPGeo state
>>topic
>> >>> >> >>according
>> >>> >> >> > to
>> >>> >> >> > > >>>the
>> >>> >> >> > > >>> >> same key as PageViewEvent. If PageViewEvent were
>> >>> >>partitioned
>> >>> >> >>by,
>> >>> >> >> > > >>>say,
>> >>> >> >> > > >>> >> member ID, but you want your IPGeo state topic to
>>be
>> >>> >> >>partitioned
>> >>> >> >> > by
>> >>> >> >> > > >>>IP
>> >>> >> >> > > >>> >> address, then you'd have to have an upstream job
>>that
>> >>> >> >> > re-partitioned
>> >>> >> >> > > >>> >> PageViewEvent into some new topic by IP address.
>>This
>> >>>new
>> >>> >> >>topic
>> >>> >> >> > will
>> >>> >> >> > > >>> >>have
>> >>> >> >> > > >>> >> to have the same number of partitions as the IPGeo
>> >>>state
>> >>> >> >>topic
>> >>> >> >> (if
>> >>> >> >> > > >>> IPGeo
>> >>> >> >> > > >>> >> has 8 partitions, then the new
>> >>>PageViewEventRepartitioned
>> >>> >> >>topic
>> >>> >> >> > > >>>needs 8
>> >>> >> >> > > >>> >>as
>> >>> >> >> > > >>> >> well). This will cause your
>>PageViewEventRepartitioned
>> >>> >>topic
>> >>> >> >>and
>> >>> >> >> > > >>>your
>> >>> >> >> > > >>> >> IPGeo state topic to be aligned such that the
>> >>>StreamTask
>> >>> >>that
>> >>> >> >> gets
>> >>> >> >> > > >>>page
>> >>> >> >> > > >>> >> views for IP address X will also have the IPGeo
>> >>> >>information
>> >>> >> >>for
>> >>> >> >> IP
>> >>> >> >> > > >>> >>address
>> >>> >> >> > > >>> >> X.
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >> Which strategy you pick is really up to you. :)
>>(4) is
>> >>> the
>> >>> >> >>most
>> >>> >> >> > > >>> >> complicated, but also the most flexible, and most
>> >>> >> >>operationally
>> >>> >> >> > > >>>sound.
>> >>> >> >> > > >>> >>(1)
>> >>> >> >> > > >>> >> is the easiest if it fits your needs.
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >> Cheers,
>> >>> >> >> > > >>> >> Chris
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >> On 8/21/14 10:15 AM, "Shekar Tippur"
>> >>><[email protected]>
>> >>> >> >>wrote:
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >> >Hello,
>> >>> >> >> > > >>> >> >
>> >>> >> >> > > >>> >> >I am new to Samza. I have just installed Hello
>>Samza
>> >>>and
>> >>> >> >>got it
>> >>> >> >> > > >>> >>working.
>> >>> >> >> > > >>> >> >
>> >>> >> >> > > >>> >> >Here is the use case for which I am trying to use
>> >>>Samza:
>> >>> >> >> > > >>> >> >
>> >>> >> >> > > >>> >> >
>> >>> >> >> > > >>> >> >1. Cache the contextual information which contains
>> >>>more
>> >>> >> >> > information
>> >>> >> >> > > >>> >>about
>> >>> >> >> > > >>> >> >the hostname or IP address using Samza/Yarn/Kafka
>> >>> >> >> > > >>> >> >2. Collect alert and metric events which contain
>> >>>either
>> >>> >> >> hostname
>> >>> >> >> > > >>>or IP
>> >>> >> >> > > >>> >> >address
>> >>> >> >> > > >>> >> >3. Append contextual information to the alert and
>> >>>metric
>> >>> >>and
>> >>> >> >> > > >>>insert to
>> >>> >> >> > > >>> >>a
>> >>> >> >> > > >>> >> >Kafka queue from which other subscribers read off
>>of.
>> >>> >> >> > > >>> >> >
>> >>> >> >> > > >>> >> >Can you please shed some light on
>> >>> >> >> > > >>> >> >
>> >>> >> >> > > >>> >> >1. Is this feasible?
>> >>> >> >> > > >>> >> >2. Am I on the right thought process
>> >>> >> >> > > >>> >> >3. How do I start
>> >>> >> >> > > >>> >> >
>> >>> >> >> > > >>> >> >I now have 1 & 2 of them working disparately. I
>>need
>> >>>to
>> >>> >> >> integrate
>> >>> >> >> > > >>> them.
>> >>> >> >> > > >>> >> >
>> >>> >> >> > > >>> >> >Appreciate any input.
>> >>> >> >> > > >>> >> >
>> >>> >> >> > > >>> >> >- Shekar
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>> >>
>> >>> >> >> > > >>>
>> >>> >> >> > > >>>
>> >>> >> >> > > >>
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> >
>> >>> >> >>
>> >>> >>
>> >>> >>
>> >>>
>> >>>
>> >>
>>
>>

Re: Samza as a Caching layer

Reply via email to