Basically you need to unbundle the elements of the RDD and then store them 
wherever you want - Use foreacPartition and then foreach 

-----Original Message-----
From: Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com] 
Sent: Thursday, April 16, 2015 6:39 PM
To: Sean Owen
Cc: user@spark.apache.org
Subject: Re: saveAsTextFile

Thanks Sean. I want to load each batch into Redshift. What's the best/most 
efficient way to do that?

Vadim


> On Apr 16, 2015, at 1:35 PM, Sean Owen <so...@cloudera.com> wrote:
> 
> You can't, since that's how it's designed to work. Batches are saved 
> in different "files", which are really directories containing 
> partitions, as is common in Hadoop. You can move them later, or just 
> read them where they are.
> 
> On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy 
> <vadim.bichuts...@gmail.com> wrote:
>> I am using Spark Streaming where during each micro-batch I output 
>> data to S3 using saveAsTextFile. Right now each batch of data is put 
>> into its own directory containing
>> 2 objects, "_SUCCESS" and "part-00000."
>> 
>> How do I output each batch into a common directory?
>> 
>> Thanks,
>> Vadim
>> ᐧ

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to