I have a HDFS folder that keeps on getting new audio files every few minutes.
My objective is to detect new files that have been added to the folder, and
then process the files in parallel without splitting it into multiple
blocks. Basically, if there are 4 new audio files added, I want the Spark
engine to detect the four files names/locations and then I can provide the
four file locations and it can use four processors to process each file. 

I tried using FileStream but there I would have to split the files into
blocks, which I do not want.  Is there any other solution ?   






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-audio-files-tp28159.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to