Hi Norman,
I saw you were running our Scala Examples. Unfortunately those do not
run as well as our Java examples right now. The Scala API was a bit of
a prototype that has some issues with efficiency. For now, you could
maybe try running our Java examples.

For your cluster, good configuration values would be numberOfTaskSlots
= 4 (number of CPU cores) and parallelization.degree.default = 32
(number of nodes X number of CPU cores).

The Scala API is being rewritten for our next release, so if you
really want to check out Scala examples I could point you to my
personal branch on github where development of the new Scala API is
taking place.

Cheers,
Aljoscha

On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg
<[email protected]> wrote:
> Hello,
> I'm a bit confused about the performance of Flink.
> My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 gb
> reserved for OS). using flink-0.6 in standalone-cluster mode.
> i played a little bit with the config-settings but without much impact on
> execution time.
> flink-conf.yaml:
> jobmanager.rpc.port: 6123
> jobmanager.heap.mb: 1024
> taskmanager.heap.mb: 14336
> taskmanager.memory.size: -1
> taskmanager.numberOfTaskSlots: 4
> parallelization.degree.default: 16
> taskmanager.network.numberOfBuffers: 4096
> fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/
>
> I tried two applications: wordcount and k-Means scala example code
> wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb.
> kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb input
> it needs 33minutes with 2.2gb nearly 90 minutes!
>
> the monitoring tool ganglia says, that cpu has low cpu utilization and a lot
> of waiting time. in wordcount cpu utilizes with nearly 100 percent.
> Is this a ordinary dimension of execution time in spark? or are
> optimizations in my config necessary? or maybe a bottleneck in the cluster?
>
> i hope somebody could help me :)
> greets Norman

Reply via email to