Hi Samza devs, users and enthusiasts,

I've kept an eye on the Samza project for a while and I think it's super cool! 
I hope it continues to mature and expand as it seems very promising (:

One thing I've been wondering for a while is: how do people serve the data they 
computed on Samza? More specifically:

  1.  How do you expose the output of Samza jobs to online applications that 
need low-latency reads?
  2.  Are these online apps mostly internal (i.e.: analytics, dashboards, etc.) 
or public/user-facing?
  3.  What systems do you currently use (or plan to use in the short-term) to 
host the data generated in Samza? HBase? Cassandra? MySQL? Druid? Others?
  4.  Are you satisfied or are you facing challenges in terms of the write 
throughput supported by these storage/serving systems? What about read 
throughput?
  5.  Are there situations where you wish to re-process all historical data 
when making improvements to your Samza job, which results in the need to 
re-ingest all of the Samza output into your online serving system (as described 
in the Kappa 
Architecture<http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html>)
 ? Is this easy breezy or painful? Do you need to throttle it lest your serving 
system will fall over?
  6.  If there was a highly-optimized and reliable way of ingesting partitioned 
streams quickly into your online serving system, would that help you leverage 
Samza more effectively?

Your insights would be much appreciated!


Thanks (:


--
Felix

Reply via email to