[jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration

Greg Hogan (JIRA) Fri, 12 Jan 2018 12:50:23 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324542#comment-16324542
 ]


Greg Hogan commented on FLINK-8414:
-----------------------------------

It is incumbent on the user to configure an appropriate parallelism for the 
quantity of data. Those graphs contain only a few tens of megabytes of data so 
it is not surprising that the optimal parallelism is around (or even lower 
than) 16. You can use `VertexMetrics` to pre-compute the size of the graph and 
adjust the parallelism at runtime (`ExecutionConfig#setParallelism`). Flink and 
Gelly are designed to scale to 100s to 1000s of parallel tasks and GBs to TBs 
of data.

> Gelly performance seriously decreases when using the suggested parallelism 
> configuration
> ----------------------------------------------------------------------------------------
>
>                 Key: FLINK-8414
>                 URL: https://issues.apache.org/jira/browse/FLINK-8414
>             Project: Flink
>          Issue Type: Bug
>          Components: Configuration, Documentation, Gelly
>            Reporter: flora karniav
>            Priority: Minor
>
> I am running Gelly examples with different datasets in a cluster of 5 
> machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each.
> The number of Slots parameter is set to 32 (as suggested) and the parallelism 
> to 128 (32 cores*4 taskmanagers).
> I observe a vast performance degradation using these suggested settings than 
> setting parallelism.default to 16 for example were the same job completes at 
> ~60 seconds vs ~140 in the 128 parallelism case.
> Is there something wrong in my configuration? Should I decrease parallelism 
> and -if so- will this inevitably decrease CPU utilization?
> Another matter that may be related to this is the number of partitions of the 
> data. Is this somehow related to parallelism? How many partitions are created 
> in the case of parallelism.default=128? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration

Reply via email to