On Tue, Nov 12, 2013 at 11:07 AM, Stephen Haberman <
stephen.haber...@gmail.com> wrote:

> Huge disclaimer that this is probably a big pita to implement, and
> could likely not be as worthwhile as I naively think it would be.
>

My perspective on this is it's already big pita of Spark users today.

In the absence of explicit directions/hints, Spark should be able to make
ballpark estimates and conservatively pick # of partitions, storage
strategies (e.g., memory vs disk) and other runtime parameters that fit the
deployment architecture/capacities.   If this requires code and extra
runtime resources for sampling/measuring data, guestimating job size, and
so on, so be it.

Users want working jobs first.  Optimal performance / resource utilization
follow from that.

Reply via email to