Ordering would be on a per-partition basis, not global ordering.

You typically want to acquire resources inside the foreachpartition
closure, just before handling the iterator.

http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd

On Mon, Nov 16, 2015 at 4:02 PM, Nipun Arora <nipunarora2...@gmail.com>
wrote:

> Hi,
> I wanted to understand forEachPartition logic. In the code below, I am
> assuming the iterator is executing in a distributed fashion.
>
> 1. Assuming I have a stream which has timestamp data which is sorted. Will
> the stringiterator in foreachPartition process each line in order?
>
> 2. Assuming I have a static pool of Kafka connections, where should I get
> a connection from a pool to be used to send data to Kafka?
>
> addMTSUnmatched.foreachRDD(
>         new Function<JavaRDD<String>, Void>() {
>             @Override
>             public Void call(JavaRDD<String> stringJavaRDD) throws Exception {
>                 stringJavaRDD.foreachPartition(
>
>                         new VoidFunction<Iterator<String>>() {
>                             @Override
>                             public void call(Iterator<String> stringIterator) 
> throws Exception {
>                                 while(stringIterator.hasNext()){
>                                     String str = stringIterator.next();
>                                     if(OnlineUtils.ESFlag) {
>                                         OnlineUtils.printToFile(str, 1, 
> type1_outputFile, OnlineUtils.client);
>                                     }else{
>                                         OnlineUtils.printToFile(str, 1, 
> type1_outputFile);
>                                     }
>                                 }
>                             }
>                         }
>                 );
>                 return null;
>             }
>         }
> );
>
>
>
> Thanks
>
> Nipun
>
>

Reply via email to