Yan Fang created SAMZA-235:
------------------------------

             Summary: Add internal input stream for hello-samza
                 Key: SAMZA-235
                 URL: https://issues.apache.org/jira/browse/SAMZA-235
             Project: Samza
          Issue Type: Improvement
          Components: hello-samza
            Reporter: Yan Fang


As reported by Sonali and Yan Fang, some corporations blocks IRC service/port. 
So they will not be able to run the hello-samza successfully. 
http://mail-archives.apache.org/mod_mbox/samza-dev/201403.mbox/%3cb84b01583bebbc45ad442b3f9045b8ac0ed46...@048-ch1mpn3-331.048d.mgd.msft.net%3E

As suggested by Jakob Homan and Chris Riccomini , we should add internal input 
stream for hello-samza as an alternative. There are two ways:
1. use simulate/fake data. 
2. use local environment related data.

I lean to the first approach. We can simulate wikimedia data (though it is a 
little boring). Because it can reuse the WikipediaParserStreamTask and 
WikipediaStatsStreamTask. Another reason is, since we use simulate data, the 
output is very predictable, that will help bring hello-samza to integration 
test stated in SAMZA-205 .

In addition, if we use FS reader in SAMZA-138 , that will also be a good 
example for writing SystemFactory (besides the out-of-box KafkaSystemFactory).





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to