Yes, definitely can be added. Just haven't gotten around to doing it :) There are proposals for this that you can try - https://github.com/apache/spark/pull/2765/files . Have you review it at some point.
On Fri, Apr 3, 2015 at 1:08 PM, Adam Ritter <adamge...@gmail.com> wrote: > That doesn't seem like a good solution unfortunately as I would be needing > this to work in a production environment. Do you know why the limitation > exists for FileInputDStream in the first place? Unless I'm missing > something important about how some of the internals work I don't see why > this feature could be added in at some point. > > On Fri, Apr 3, 2015 at 12:47 PM, Tathagata Das <t...@databricks.com> > wrote: > >> I sort-a-hacky workaround is to use a queueStream where you can manually >> create RDDs (using sparkContext.hadoopFile) and insert into the queue. Note >> that this is for testing only as queueStream does not work with driver >> fautl recovery. >> >> TD >> >> On Fri, Apr 3, 2015 at 12:23 PM, adamgerst <adamge...@gmail.com> wrote: >> >>> So after pulling my hair out for a bit trying to convert one of my >>> standard >>> spark jobs to streaming I found that FileInputDStream does not support >>> nested folders (see the brief mention here >>> >>> http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources >>> the fileStream method returns a FileInputDStream). So before, for my >>> standard job, I was reading from say >>> >>> s3n://mybucket/2015/03/02/*log >>> >>> And could also modify it to simply get an entire months worth of logs. >>> Since the logs are split up based upon their date, when the batch ran for >>> the day, I simply passed in a parameter of the date to make sure I was >>> reading the correct data >>> >>> But since I want to turn this job into a streaming job I need to simply >>> do >>> something like >>> >>> s3n://mybucket/*log >>> >>> This would totally work fine if it were a standard spark application, but >>> fails for streaming. Is there anyway I can get around this limitation? >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-FileStream-Nested-File-Support-tp22370.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >