[ 
https://issues.apache.org/jira/browse/SAMZA-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981719#comment-13981719
 ] 

Chris Riccomini commented on SAMZA-235:
---------------------------------------

I like (2) best. We can just have a call-out in the hello-samza tutorial that 
says, if you don't have internet access, don't run the raw wikipedia job, just 
run a command that pipes data into the wikipedia raw topic. We can check in a 
small file, and just tell users to feed it into kafka with something like:

{noformat}
while sleep 1; do deploy/kafka/bin/kafka-console-producer.sh < cat 
wikipedia-raw.json; done
{noformat}

> Add internal input stream for hello-samza
> -----------------------------------------
>
>                 Key: SAMZA-235
>                 URL: https://issues.apache.org/jira/browse/SAMZA-235
>             Project: Samza
>          Issue Type: Improvement
>          Components: hello-samza
>            Reporter: Yan Fang
>            Assignee: Yan Fang
>
> As reported by Sonali and Yan Fang, some corporations blocks IRC 
> service/port. So they will not be able to run the hello-samza successfully. 
> http://mail-archives.apache.org/mod_mbox/samza-dev/201403.mbox/%3cb84b01583bebbc45ad442b3f9045b8ac0ed46...@048-ch1mpn3-331.048d.mgd.msft.net%3E
> As suggested by [~jghoman] and [~criccomini] , we should add internal input 
> stream for hello-samza as an alternative. There are two ways:
> 1. use simulate/fake data. 
> 2. use local environment related data.
> I lean to the first approach. We can simulate wikimedia data (though it is a 
> little boring). Because it can reuse the WikipediaParserStreamTask and 
> WikipediaStatsStreamTask. Another reason is, since we use simulate data, the 
> output is very predictable, that will help bring hello-samza to integration 
> test stated in SAMZA-205 .
> In addition, if we use FS reader in SAMZA-138 , that will also be a good 
> example for writing SystemFactory (besides the out-of-box KafkaSystemFactory).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to