Copy should be doable but I'm not sure how to specify a prefix for the directory while keeping the filename (ie part-00000) fixed in copy command.
> On Apr 16, 2015, at 1:51 PM, Sean Owen <so...@cloudera.com> wrote: > > Just copy the files? it shouldn't matter that much where they are as > you can find them easily. Or consider somehow sending the batches of > data straight into Redshift? no idea how that is done but I imagine > it's doable. > > On Thu, Apr 16, 2015 at 6:38 PM, Vadim Bichutskiy > <vadim.bichuts...@gmail.com> wrote: >> Thanks Sean. I want to load each batch into Redshift. What's the best/most >> efficient way to do that? >> >> Vadim >> >> >>> On Apr 16, 2015, at 1:35 PM, Sean Owen <so...@cloudera.com> wrote: >>> >>> You can't, since that's how it's designed to work. Batches are saved >>> in different "files", which are really directories containing >>> partitions, as is common in Hadoop. You can move them later, or just >>> read them where they are. >>> >>> On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy >>> <vadim.bichuts...@gmail.com> wrote: >>>> I am using Spark Streaming where during each micro-batch I output data to >>>> S3 >>>> using >>>> saveAsTextFile. Right now each batch of data is put into its own directory >>>> containing >>>> 2 objects, "_SUCCESS" and "part-00000." >>>> >>>> How do I output each batch into a common directory? >>>> >>>> Thanks, >>>> Vadim >>>> ᐧ --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org