Coalesce tries to reduce the number of partitions into smaller number of
partitions, without moving the data around (as much as possible). Since
most of received data is in a few machines (those running receivers),
coallesce just makes bigger merged partitions in those.

Without coalesce
Machine 1: 10 partitions processing in parallel
Machine 2: 2 partitions processing in parallel

With coalesce
Machine 1: 10 partitions merged into 1 partition processed together taking
10 times longer
Machine 2: 2 partitions merged into 1 partition process together taking 2
times longer

Hope this clarifies.

TD

On Fri, Apr 10, 2015 at 5:16 AM, 邓刚[技术中心] <triones.d...@vipshop.com> wrote:

>   Hi All,
>
>          We are running a spark streaming application. The data source is
> kafka, the data partition of kafka is not well-distributed
> <http://dict.cn/well-distributed>  but every receiver on every executor
> can receive data, just different of the amount.
>
> and our data is very large so we try to a local repartition with
> coalesce(*.false). but we found an odd appearances。
>
>
>
> Most of the task running on one executor.  See picture one. When we remove
> the coalesce call the task can distributed
> <http://dict.cn/well-distributed>  better see picture two. Any one knows
> why?
>
>
>
> *Picture one*
>
>
>
>
>
> *Picture two*
>
>
> 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
> This communication is intended only for the addressee(s) and may contain
> information that is privileged and confidential. You are hereby notified
> that, if you are not an intended recipient listed above, or an authorized
> employee or agent of an addressee of this communication responsible for
> delivering e-mail messages to an intended recipient, any dissemination,
> distribution or reproduction of this communication (including any
> attachments hereto) is strictly prohibited. If you have received this
> communication in error, please notify us immediately by a reply e-mail
> addressed to the sender and permanently delete the original e-mail
> communication and any attachments from all storage devices without making
> or otherwise retaining a copy.
>

Reply via email to