You can always repartition, but maybe for your use case different rdds with the 
same data, but different partition strategies could make sense. It may also 
make sense to choose an appropriate format on disc (orc, parquet). You have to 
choose based also on the users' non-functional requirements.

> On 2. Apr 2017, at 12:32, <jasbir.s...@accenture.com> 
> <jasbir.s...@accenture.com> wrote:
> 
> Hi,
>  
> I have RDD with 4 years’ data with suppose 20 partitions. On runtime, user 
> can decide to select few months or years of RDD. That means, based upon user 
> time selection RDD is being filtered and on filtered RDD further 
> transformations and actions are performed. And, as spark says, child RDD get 
> partitions from parent RDD.
>  
> Therefore, is there any way to decide partitioning strategy after filter 
> operations?
>  
> Regards,
> Jasbir Singh
> 
> 
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise confidential information. If you have received it 
> in error, please notify the sender immediately and delete the original. Any 
> other use of the e-mail by you is prohibited. Where allowed by local law, 
> electronic communications with Accenture and its affiliates, including e-mail 
> and instant messaging (including content), may be scanned by our systems for 
> the purposes of information security and assessment of internal compliance 
> with Accenture policy. 
> ______________________________________________________________________________________
> 
> www.accenture.com

Reply via email to