Hi, Dear Spark Streaming Developers and Users,
We are prototyping using spark streaming and hit the following 2 issues thatI 
would like to seek your expertise.
1) We have a spark streaming application in scala, that reads  data from Kafka 
intoa DStream, does some processing and output a transformed DStream. If for 
some reasonthe Kafka connection is not available or timed out, the spark 
streaming job will startto send empty RDD afterwards. The log is clean w/o any 
ERROR indicator. I googled  around and this seems to be a known issue.We 
believe that spark streaming infrastructure should either retry or return 
error/exception.Can you share how you handle this case?
2) We would like implement a spark streaming job that join an 1 minute  
duration DStream of real time eventswith a metadata RDD that was read from a 
database. The metadata only changes slightly each day in the database.So what 
is the best practice of refresh the RDD daily keep the streaming join job 
running? Is this do-able as of spark 1.1.0?
Thanks.
Tian

Reply via email to