Re: flink performance

Ufuk Celebi Mon, 08 Sep 2014 09:03:33 -0700

Just to make sure that there is no confusion: with Aljoscha's refactoring
the Scala API will be a thin layer ontop of the Java API and should have
comparable/same performance as the Java API (one difference is that Scala
tuples are immutable whereas Java tuples are mutable and instances can be
reused).


I don't want someone in the future reading this thread to think that the
Scala API incurs a large performance hit. ;)

On Mon, Sep 8, 2014 at 5:13 PM, Aljoscha Krettek <[email protected]>
wrote:

> Ok.
>
> My work is available here:
> https://github.com/aljoscha/incubator-flink/tree/scala-rework
>
> Please look at the WordCount and KMeans example to see how the API has
> changed but basically only the way you create Data Sources is changed.
>
> I'm looking forward to your feedback. :D
>
> On Mon, Sep 8, 2014 at 4:22 PM, Norman Spangenberg
> <[email protected]> wrote:
> > I tried different values for the numberOfTaskSlots (1, 2, 4, 8) and DOP
> to
> > optimize flink.
> > @Aljoscha: it would be great to try out the new Scala-API for flink. I
> wrote
> > already some other apps in scala, so I doesn't have to rewrite them.
> >
> > Am 08.09.2014 16:13, schrieb Robert Metzger:
> >
> >> There is probably a little typo in Aljoscha's answer. The
> >> taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per
> machine)
> >> The parallelization.degree.default is correct.
> >>
> >> On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <[email protected]>
> >> wrote:
> >>
> >>> Hi Norman,
> >>> I saw you were running our Scala Examples. Unfortunately those do not
> >>> run as well as our Java examples right now. The Scala API was a bit of
> >>> a prototype that has some issues with efficiency. For now, you could
> >>> maybe try running our Java examples.
> >>>
> >>> For your cluster, good configuration values would be numberOfTaskSlots
> >>> = 4 (number of CPU cores) and parallelization.degree.default = 32
> >>> (number of nodes X number of CPU cores).
> >>>
> >>> The Scala API is being rewritten for our next release, so if you
> >>> really want to check out Scala examples I could point you to my
> >>> personal branch on github where development of the new Scala API is
> >>> taking place.
> >>>
> >>> Cheers,
> >>> Aljoscha
> >>>
> >>> On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg
> >>> <[email protected]> wrote:
> >>>>
> >>>> Hello,
> >>>> I'm a bit confused about the performance of Flink.
> >>>> My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5
> >>>> gb
> >>>> reserved for OS). using flink-0.6 in standalone-cluster mode.
> >>>> i played a little bit with the config-settings but without much impact
> >>>> on
> >>>> execution time.
> >>>> flink-conf.yaml:
> >>>> jobmanager.rpc.port: 6123
> >>>> jobmanager.heap.mb: 1024
> >>>> taskmanager.heap.mb: 14336
> >>>> taskmanager.memory.size: -1
> >>>> taskmanager.numberOfTaskSlots: 4
> >>>> parallelization.degree.default: 16
> >>>> taskmanager.network.numberOfBuffers: 4096
> >>>> fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/
> >>>>
> >>>> I tried two applications: wordcount and k-Means scala example code
> >>>> wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb.
> >>>> kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb
> >>>
> >>> input
> >>>>
> >>>> it needs 33minutes with 2.2gb nearly 90 minutes!
> >>>>
> >>>> the monitoring tool ganglia says, that cpu has low cpu utilization
> and a
> >>>
> >>> lot
> >>>>
> >>>> of waiting time. in wordcount cpu utilizes with nearly 100 percent.
> >>>> Is this a ordinary dimension of execution time in spark? or are
> >>>> optimizations in my config necessary? or maybe a bottleneck in the
> >>>
> >>> cluster?
> >>>>
> >>>> i hope somebody could help me :)
> >>>> greets Norman
> >
> >
>

Re: flink performance

Reply via email to