I've got an RDD where each element is a long string (a whole document). I'm
using pyspark so some of the handy partition-handling functions aren't
available, and I count the number of elements in each partition with:
def count_partitions(id, iterator):
c = sum(1 for _ in iterator)
On Thu, Aug 28, 2014 at 7:00 AM, Rok Roskar rokros...@gmail.com wrote:
I've got an RDD where each element is a long string (a whole document). I'm
using pyspark so some of the handy partition-handling functions aren't
available, and I count the number of elements in each partition with:
def