Re: Spark-SQL parameters like shuffle.partitions should be stored in the lineage

Sean Owen Tue, 15 Nov 2016 02:47:26 -0800

Once you get to needing this level of fine-grained control, should you not
consider using the programmatic API in part, to let you control individual
jobs?


On Tue, Nov 15, 2016 at 1:19 AM leo9r <lezcano....@gmail.com> wrote:

> Hi Daniel,
>
> I completely agree with your request. As the amount of data being processed
> with SparkSQL grows, tweaking sql.shuffle.partitions becomes a common need
> to prevent OOM and performance degradation. The fact that
> sql.shuffle.partitions cannot be set several times in the same job/action,
> because of the reason you explain, is a big inconvenient for the
> development
> of ETL pipelines.
>
> Have you got any answer or feedback in this regard?
>
> Thanks,
> Leo Lezcano
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-parameters-like-shuffle-partitions-should-be-stored-in-the-lineage-tp13240p19867.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Spark-SQL parameters like shuffle.partitions should be stored in the lineage

Reply via email to