Is there pythonic/sparkonic way to test for an empty RDD before using the
foreachRDD?  Basically I am using the Python example
https://spark.apache.org/docs/latest/streaming-programming-guide.html to
"put records somewhere"  When I have data, it works fine, when I don't I
get an exception. I am not sure about the performance implications of just
throwing an exception every time there is no data, but can I just test
before sending it?

I did see one post mentioning look for take(1) from the stream to test for
data, but I am not sure where I put that in this example... Is that in the
lambda function? or somewhere else? Looking for pointers!
Thanks!



mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD))



Using this example code from the link above:


def sendPartition(iter):
    connection = createNewConnection()
    for record in iter:
        connection.send(record)
    connection.close()
dstream.foreachRDD(lambda rdd: rdd.foreachPartition(sendPartition))

Reply via email to