Hi, CollectionInputFormat currently enforces a parallelism of 1 by implementing NonParallelInput and serializing the entire Collection. If my understanding is correct this serialized InputFormat is often the cause of a new job exceeding the akka message size limit.
As an alternative the Collection elements could be serialized into multiple InputSplits. Has this idea been considered and rejected? Thanks, Greg