Foreachpartition callback is provided with Iterator by the Spark Frameowrk – 
while iterator.hasNext() ……

 

Also check whether this is not some sort of Python Spark API bug – Python seems 
to be the foster child here – Scala and Java are the darlings

 

From: John Omernik [mailto:j...@omernik.com] 
Sent: Friday, June 5, 2015 4:08 PM
To: user
Subject: Spark Streaming for Each RDD - Exception on Empty

 

Is there pythonic/sparkonic way to test for an empty RDD before using the 
foreachRDD?  Basically I am using the Python example 
https://spark.apache.org/docs/latest/streaming-programming-guide.html to "put 
records somewhere"  When I have data, it works fine, when I don't I get an 
exception. I am not sure about the performance implications of just throwing an 
exception every time there is no data, but can I just test before sending it?

 

I did see one post mentioning look for take(1) from the stream to test for 
data, but I am not sure where I put that in this example... Is that in the 
lambda function? or somewhere else? Looking for pointers!

Thanks!

 

 

 

mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD))

 

 

Using this example code from the link above:

 

def sendPartition(iter):
    connection = createNewConnection()
    for record in iter:
        connection.send(record)
    connection.close()
 
dstream.foreachRDD(lambda rdd: rdd.foreachPartition(sendPartition))

Reply via email to