Copy should be doable but I'm not sure how to specify a prefix for the 
directory while keeping the filename (ie part-00000) fixed in copy command.



> On Apr 16, 2015, at 1:51 PM, Sean Owen <so...@cloudera.com> wrote:
> 
> Just copy the files? it shouldn't matter that much where they are as
> you can find them easily. Or consider somehow sending the batches of
> data straight into Redshift? no idea how that is done but I imagine
> it's doable.
> 
> On Thu, Apr 16, 2015 at 6:38 PM, Vadim Bichutskiy
> <vadim.bichuts...@gmail.com> wrote:
>> Thanks Sean. I want to load each batch into Redshift. What's the best/most 
>> efficient way to do that?
>> 
>> Vadim
>> 
>> 
>>> On Apr 16, 2015, at 1:35 PM, Sean Owen <so...@cloudera.com> wrote:
>>> 
>>> You can't, since that's how it's designed to work. Batches are saved
>>> in different "files", which are really directories containing
>>> partitions, as is common in Hadoop. You can move them later, or just
>>> read them where they are.
>>> 
>>> On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy
>>> <vadim.bichuts...@gmail.com> wrote:
>>>> I am using Spark Streaming where during each micro-batch I output data to 
>>>> S3
>>>> using
>>>> saveAsTextFile. Right now each batch of data is put into its own directory
>>>> containing
>>>> 2 objects, "_SUCCESS" and "part-00000."
>>>> 
>>>> How do I output each batch into a common directory?
>>>> 
>>>> Thanks,
>>>> Vadim
>>>> ᐧ

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to