[ 
https://issues.apache.org/jira/browse/SAMZA-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967191#comment-13967191
 ] 

Yan Fang commented on SAMZA-235:
--------------------------------

True. Processing stored file under a stream (real-time) processing framework 
does look a little awkward. In terms of "no end", we could have a loop to 
process files infinitely. The only concern is "real-time" senario. My 
workaround will be 1) directing users to try external resource first and only 
turn to internal file processing when external approach does not work. (This 
will need us to explain very well in docs...) 2) Or make the hello-samza also 
accept user's Kafka input from console for topic, say, wikipedia-raw. The 
hello-samza ,say, in topic, wikipedia-edits, will show the relevant results 
immediatly (real-time calculation). (we can use some other topic names of 
course.) 

Otherwise, for "realtime" senario, we have to use local environment matrics for 
internal input stream, which you shied away previously.

What do you think?

> Add internal input stream for hello-samza
> -----------------------------------------
>
>                 Key: SAMZA-235
>                 URL: https://issues.apache.org/jira/browse/SAMZA-235
>             Project: Samza
>          Issue Type: Improvement
>          Components: hello-samza
>            Reporter: Yan Fang
>
> As reported by Sonali and Yan Fang, some corporations blocks IRC 
> service/port. So they will not be able to run the hello-samza successfully. 
> http://mail-archives.apache.org/mod_mbox/samza-dev/201403.mbox/%3cb84b01583bebbc45ad442b3f9045b8ac0ed46...@048-ch1mpn3-331.048d.mgd.msft.net%3E
> As suggested by [~jghoman] and [~criccomini] , we should add internal input 
> stream for hello-samza as an alternative. There are two ways:
> 1. use simulate/fake data. 
> 2. use local environment related data.
> I lean to the first approach. We can simulate wikimedia data (though it is a 
> little boring). Because it can reuse the WikipediaParserStreamTask and 
> WikipediaStatsStreamTask. Another reason is, since we use simulate data, the 
> output is very predictable, that will help bring hello-samza to integration 
> test stated in SAMZA-205 .
> In addition, if we use FS reader in SAMZA-138 , that will also be a good 
> example for writing SystemFactory (besides the out-of-box KafkaSystemFactory).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to