I have a HDFS folder that keeps on getting new audio files every few minutes. My objective is to detect new files that have been added to the folder, and then process the files in parallel without splitting it into multiple blocks. Basically, if there are 4 new audio files added, I want the Spark engine to detect the four files names/locations and then I can provide the four file locations and it can use four processors to process each file.
I tried using FileStream but there I would have to split the files into blocks, which I do not want. Is there any other solution ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-audio-files-tp28159.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org