Re: Performance and Latency Chart for Flink

amir bahmanyari Mon, 19 Sep 2016 20:52:40 -0700

Hi Greg,Setting  "taskmanager.memory.preallocate" to true caused "Association 
with remote system [akka.tcp://flink@" "has failed" "[Disassociated]" on all 
TMs.Changed it back to false.I increased the NW buffers to 1 G & started to get 
TM slots  exceptions. 
So I am going incremental with that value. Have it set at 8192 (twice as much 
as before 4096).Thanks


      From: Greg Hogan <[email protected]>
 To: [email protected]; amir bahmanyari <[email protected]> 
 Sent: Monday, September 19, 2016 1:28 PM
 Subject: Re: Performance and Latency Chart for Flink
   
My thought would be to compare the data rate and buffer sizes which gives a
refresh interval. For example, if you are transmitting 1 GB/s on 128 MiB of
network buffers then the refresh rate is at most 1/8 second. There is the
same consideration with spill files if the system does not have sufficient
free memory for a large number of readahead buffers. Another set of buffers
are the kernel socket buffers and you can increase from the Linux default 4
MiB by changing "taskmanager.net.sendReceiveBufferSize" (documentation is
in progress; see org.apache.flink.runtime.io.network.netty.NettyConfig).

Your nodes have 100+ GB of memory so a conservative assignment might be a
gigabyte of network buffers. Then add the following to the conf, restart
the cluster, start jconsole on a TaskManager, connect to the TaskManager
process, and on the MBeans tab look under org.apache.flink.metrics for
Network.AvailableMemorySegments.

metrics.reporters: my_jmx_reporter
metrics.reporter.my_jmx_reporter.class:
org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.my_jmx_reporter.port: 9020-9040


On Mon, Sep 19, 2016 at 3:54 PM, amir bahmanyari <
[email protected]> wrote:

> Thanks Greg."Your setting of 4096 is only 128 MiB."...Correct. Cz I
> followed that formula :-)))I can bump it up to twice as much like what the
> example is doing to for instance 300 MiB.Is this reasonable? what do you
> suggest as a reasonable range?Thanks Greg
>
>      From: Greg Hogan <[email protected]>
>  To: [email protected]; amir bahmanyari <[email protected]>
>  Sent: Monday, September 19, 2016 12:43 PM
>  Subject: Re: Performance and Latency Chart for Flink
>
> You will need to add the configuration parameters to your flink-conf.yaml.
> I believe the intent is that all configuration parameters should be listed
> at
>
> https://ci.apache.org/projects/flink/flink-docs-
> master/setup/config.html#full-reference
>
> My understanding is that the Flink buffers are currently copied to Netty
> buffers, although I don't understand the stated memory doubling.
>
>
> On Mon, Sep 19, 2016 at 3:08 PM, amir bahmanyari <
> [email protected]> wrote:
>
> > Hi Greg,In the same Flink config link below, there are parameters that
> > dont even exist in flink-conf.yaml.Are they defined somewhere else?I
> > grepped the followings & none existed in any of the files under conf
> > folder."taskmanager.memory.fraction", taskmanager.memory.off
> > -heap, taskmanager.memory.segment-size & many more.
> > Also, isnt the example calculating the network buffers wrong? Based on
> the
> > example, roughly 5000 buffers x 32KiB = 160000 KiB should be
> > allocated.160000 KiB divided by 1024 = 156.25 MiB. Why is the example
> > saying "the system would allocate roughly 300 MiBytes for network
> buffers."
> > ?Thats roughly twice as much. Am i Missing something here?I still need
> your
> > help to set the accurate number for my
> >    - taskmanager.network.numberOfBuffers = 4096.
> >
> > Thanks for your response Greg.Amir-      From: amir bahmanyari <
> > [email protected]>
> >  To: "[email protected]" <[email protected]>
> >  Sent: Monday, September 19, 2016 10:34 AM
> >  Subject: Re: Performance and Latency Chart for Flink
> >
> > Hi Greg,I used this guideline to calculate "taskmanager.network.
> numberOfBuffers":Apache
> > Flink 1.2-SNAPSHOT Documentation: Configuration
> >
> >
> > |
> > |
> > |
> > |  |    |
> >
> >  |
> >
> >  |
> > |
> > |  |
> > Apache Flink 1.2-SNAPSHOT Documentation: Configuration
> >    |  |
> >
> >  |
> >
> >  |
> >
> >
> >
> > 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4
> > is there in the formula.What would you set it to? Once I have that
> number,
> > I will set  "taskmanager.memory.preallocate" to true & will give it
> > another shot.Thanks Greg
> >
> >      From: Greg Hogan <[email protected]>
> >  To: [email protected]; amir bahmanyari <[email protected]>
> >  Sent: Monday, September 19, 2016 8:29 AM
> >  Subject: Re: Performance and Latency Chart for Flink
> >
> > Hi Amir,
> >
> > You may see improved performance setting "taskmanager.memory.
> preallocate:
> > true" in order to use off-heap memory.
> >
> > Also, your number of buffers looks quite low and you may want to increase
> > "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128
> > MiB.
> >
> > As this is a only benchmark are you able to post the code to github to
> > solicit feedback?
> >
> > Greg
> >
> > On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari <
> > [email protected]> wrote:
> >
> > > I have new findings & subsequently relative improvements.Am testing as
> we
> > > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I
> had
> > > keep state somewhere. I went with Redis. I found it to be a major
> bottle
> > > neck as Beam nodes constantly are going across NW to update its
> > > repository.So I replaced Redis with Java Concurrenthashmaps. Must
> faster.
> > > Then Kafka went out of disk space and the replication manager
> > > complained. So I clustered the two Kafka nodes hoping for sharing
> space.
> > As
> > > of this second I am typing this email, its sustaining but only 1/2 of
> > > the 201401969  tuples have been processed after 3.5 hours.According to
> > the
> > > Linear Road benchmarking expectations, if your system is working well,
> > this
> > > whole 201401969  tuples must be done in 3.5 hrs max.So this means there
> > is
> > > still room for tuning Flink nodes. I have already shared with you all
> > more
> > > details about my config.It run perfect yesterday with almost 1/10th of
> > this
> > > load. Perfect real-time send/processed streaming behavior.If thats the
> > case
> > > & I cannot get better performance with FlinkRunner, my nest stop is
> > > SparkRunner and repeat of the whole thing for final benchmarking of the
> > two
> > > under Beam APIs.Which was the initial intent anyways.If you have
> > > suggestions to make improvements in the above case, I am all ears &
> > greatly
> > > appreciate it.Cheers,Amir-
> > >
> > >      From: "Chawla,Sumit" <[email protected]>
> > >  To: [email protected]; amir bahmanyari <[email protected]>
> > >  Sent: Sunday, September 18, 2016 2:07 PM
> > >  Subject: Re: Performance and Latency Chart for Flink
> > >
> > > Has anyone else run these kind of benchmarks?  Would love to hear more
> > > people'e experience and details about those benchmarks.
> > >
> > > Regards
> > > Sumit Chawla
> > >
> > >
> > > On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <[email protected]>
> > > wrote:
> > >
> > > > Hi Amir
> > > >
> > > > Would it be possible for you to share the numbers? Also share if
> > possible
> > > > your configuration details.
> > > >
> > > > Regards
> > > > Sumit Chawla
> > > >
> > > >
> > > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari <
> > > > [email protected]> wrote:
> > > >
> > > >> Hi Fabian,FYI. This is report on other engines we did the same type
> of
> > > >> bench-marking.Also explains what Linear Road bench-marking is.Thanks
> > for
> > > >> your help.
> > > >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-
> > > >> linear-road-benchmark
> > > >> https://github.com/IBMStreams/benchmarks
> > > >> https://www.datatorrent.com/blog/blog-implementing-linear-ro
> > > >> ad-benchmark-in-apex/
> > > >>
> > > >>
> > > >>      From: Fabian Hueske <[email protected]>
> > > >>  To: "[email protected]" <[email protected]>
> > > >>  Sent: Friday, September 16, 2016 12:31 AM
> > > >>  Subject: Re: Performance and Latency Chart for Flink
> > > >>
> > > >> Hi,
> > > >>
> > > >> I am not aware of periodic performance runs for the Flink releases.
> > > >> I know a few benchmarks which have been published at different
> points
> > in
> > > >> time like [1], [2], and [3] (you'll probably find more).
> > > >>
> > > >> In general, fair benchmarks that compare different systems (if there
> > is
> > > >> such thing) are very difficult and the results often depend on the
> use
> > > >> case.
> > > >> IMO the best option is to run your own benchmarks, if you have a
> > > concrete
> > > >> use case.
> > > >>
> > > >> Best, Fabian
> > > >>
> > > >> [1] 08/2015:
> > > >> http://data-artisans.com/high-throughput-low-latency-and-exa
> > > >> ctly-once-stream-processing-with-apache-flink/
> > > >> [2] 12/2015:
> > > >> https://yahooeng.tumblr.com/post/135321837876/benchmarking-
> > > >> streaming-computation-engines-at
> > > >> [3] 02/2016:
> > > >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
> > > >>
> > > >>
> > > >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <[email protected]>:
> > > >>
> > > >> > Hi
> > > >> >
> > > >> > Is there any performance run that is done for each Flink release?
> Or
> > > you
> > > >> > are aware of any third party evaluation of performance metrics for
> > > >> Flink?
> > > >> > I am interested in seeing how performance has improved over
> release
> > to
> > > >> > release, and performance vs other competitors.
> > > >> >
> > > >> > Regards
> > > >> > Sumit Chawla
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
> >
>
>
>
>

Re: Performance and Latency Chart for Flink

Reply via email to