[
https://issues.apache.org/jira/browse/SAMZA-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967191#comment-13967191
]
Yan Fang commented on SAMZA-235:
--------------------------------
True. Processing stored file under a stream (real-time) processing framework
does look a little awkward. In terms of "no end", we could have a loop to
process files infinitely. The only concern is "real-time" senario. My
workaround will be 1) directing users to try external resource first and only
turn to internal file processing when external approach does not work. (This
will need us to explain very well in docs...) 2) Or make the hello-samza also
accept user's Kafka input from console for topic, say, wikipedia-raw. The
hello-samza ,say, in topic, wikipedia-edits, will show the relevant results
immediatly (real-time calculation). (we can use some other topic names of
course.)
Otherwise, for "realtime" senario, we have to use local environment matrics for
internal input stream, which you shied away previously.
What do you think?
> Add internal input stream for hello-samza
> -----------------------------------------
>
> Key: SAMZA-235
> URL: https://issues.apache.org/jira/browse/SAMZA-235
> Project: Samza
> Issue Type: Improvement
> Components: hello-samza
> Reporter: Yan Fang
>
> As reported by Sonali and Yan Fang, some corporations blocks IRC
> service/port. So they will not be able to run the hello-samza successfully.
> http://mail-archives.apache.org/mod_mbox/samza-dev/201403.mbox/%3cb84b01583bebbc45ad442b3f9045b8ac0ed46...@048-ch1mpn3-331.048d.mgd.msft.net%3E
> As suggested by [~jghoman] and [~criccomini] , we should add internal input
> stream for hello-samza as an alternative. There are two ways:
> 1. use simulate/fake data.
> 2. use local environment related data.
> I lean to the first approach. We can simulate wikimedia data (though it is a
> little boring). Because it can reuse the WikipediaParserStreamTask and
> WikipediaStatsStreamTask. Another reason is, since we use simulate data, the
> output is very predictable, that will help bring hello-samza to integration
> test stated in SAMZA-205 .
> In addition, if we use FS reader in SAMZA-138 , that will also be a good
> example for writing SystemFactory (besides the out-of-box KafkaSystemFactory).
--
This message was sent by Atlassian JIRA
(v6.2#6252)