Hey guys,

Long time, no see :). I recently started a new job and it involves
performing a set of real-time data analytics using Apache Kafka, Storm
and Flume.

What happens, on a very high level, is that set of signals is
collected, stored into a Kafka topic and then Storm is used to filter
certain fields out or to enrich the fields with other
meta-information. Finally, Flume writes the output into mutiple HDFS
files depending on the date, hour etc.

Now, I saw that Flink can play with a similar pipeline, but without
needing Flume for the writing to HDFS part (see
http://data-artisans.com/kafka-flink-a-practical-how-to/). Which
brings me to my question: jow does Flink handle writing to multiple
files in a streaming fashion? -until now, I was playing with batch and
writeAsCsv just took one file as a parameter-

Next question: What are the prerequisites to deploy a Flink Streaming
job on a cluster? Yarn, HDFS, anything else?

Final question, more of a request: I'd like to play around with Flink
Streaming to state whether it can substitute Storm in this use case
and whether it can outrun it :P. To this end, I'll need some starting
points: docs, blog posts, examples to read. Any input would be useful.

I wanted to dig for a newbie task in the streaming area, but I could
not find one... can we think of something easy to get me started?

Thanks! Hope you guys had fun at Flink Forward!
Andra

Reply via email to