Hi, On 2017-09-21 05:34 AM, liguang li wrote: > Hi, > > On Wed, Sep 20, 2017 at 10:58 PM, Jonathan Rajotte Julien > <[email protected] > <mailto:[email protected]>> wrote: > > Hi, > > On 2017-09-20 05:12 AM, liguang li wrote: > > > > > > On Tue, Sep 19, 2017 at 10:57 PM, Jonathan Rajotte-Julien > <[email protected] > <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>>> wrote: > > > > On Tue, Sep 19, 2017 at 03:53:27PM +0800, liguang li wrote: > > > Hi, > > > On Mon, Sep 18, 2017 at 11:18 PM, Jonathan Rajotte-Julien > > > <[1][email protected] > <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>>> wrote: > > > > > > Hi, > > > On Mon, Sep 18, 2017 at 11:32:07AM +0800, liguang li wrote: > > > > Hi, > > > > > > > > Create a session in live-reading mode, run a > application which > > > having very > > > > high event throughput, then prints > > > > the events with babeltrace. We found the live trace > viewer are > > > viewing > > > > events a few seconds ago, and as time > > > > > > Could you provide us the version used for babeltrace, > lttng-tools and > > > lttng-ust? > > > > > > Babeltrace: 1.5.1 > > > > Update to babeltrace 1.5.3. > > > > > Lttng-tools: 2.8.6 > > > > Update to lttng-tools 2.8.8 > > > > > Lttng-ust: 2.8.2 > > > > > > > goes on, the delay will be bigger and bigger. > > > > > > A similar issues was observed a couple months ago, which > implicated > > > multiple delayed ack > > > problems during communication between lttng-relayd and > babeltrace. > > > > > > The following fixes were merged: > > > > > > [1] > > > > [2]https://github.com/lttng/lttng-tools/commit/b6025e9476332b75eb8184345c3eb3e924780088 > > <https://github.com/lttng/lttng-tools/commit/b6025e9476332b75eb8184345c3eb3e924780088> > > <https://github.com/lttng/lttng-tools/commit/b6025e9476332b75eb8184345c3eb3e924780088 > > <https://github.com/lttng/lttng-tools/commit/b6025e9476332b75eb8184345c3eb3e924780088>> > > > [2] > > > > [3]https://github.com/efficios/babeltrace/commit/de417d04317ca3bc30f59685a9d19de670e4b11d > > <https://github.com/efficios/babeltrace/commit/de417d04317ca3bc30f59685a9d19de670e4b11d> > > <https://github.com/efficios/babeltrace/commit/de417d04317ca3bc30f59685a9d19de670e4b11d > > <https://github.com/efficios/babeltrace/commit/de417d04317ca3bc30f59685a9d19de670e4b11d>> > > > [3] > > > > [4]https://github.com/efficios/babeltrace/commit/4594dbd8f7c2af2446a3e310bee74ba4a2e9d648 > > <https://github.com/efficios/babeltrace/commit/4594dbd8f7c2af2446a3e310bee74ba4a2e9d648> > > <https://github.com/efficios/babeltrace/commit/4594dbd8f7c2af2446a3e310bee74ba4a2e9d648 > > <https://github.com/efficios/babeltrace/commit/4594dbd8f7c2af2446a3e310bee74ba4a2e9d648>> > > > > > > In the event that you are already using an updated version > of babeltrace > > > and > > > lttng-tools, it would be pertinent to provide us with a > simple > > > reproducer so we > > > can assess the issue. > > > > The version you are using does not include the mentioned fixes. > Please update > > and redo your experiment. > > > > > > Test this issue in the version you have listed, the issue still exists. > > Given that previous versions had a major timing problem there I would > expect to have some improvement. > > In that case, we will need a lot more information on your benchmarking > strategy. > We will need a simple reproducer, your benchmark code (r,gnuplot etc.), > your overall methodology > to be able to reproduce the issue locally. Otherwise, it will be very hard > to come to any conclusion. > > > Sorry, we can not provide the detailed information about the benchmark code > due to the company regulations. > > > > > > > > Cheers > > > > > > > > > > > Steps: > > > lttng create session --live -U net://* > > > lttng enable-channel -s session -u ch1 > > > lttng enable-event -s session -c ch1 -u -a > > > lttng start > > > > > > Run a high event throughput application, which is > multithreaded > > > application> > > > > In the multithreaded application, each tracepoint will have the wall > > time of the system,then we can easily reproduce this issue through > > comparing the time of recorded event and the system wall time. > > > > > > > babeltrace -i lttng-live net://* > > > > > > After a while, we found the timestamp of the event in the > babeltrace is > > > different with the time in host > > > which run the application. And the delay will be bigger and > bigger with > > > time goes. > > > > > > > > > Cheers > > > > I checked the source code, found Babeltrace in > live-reading mode > > > will read > > > > the recorded events in the CTF > > > > files, and then parse and print it in a single thread. > The process > > > is a > > > > little slow, do you have any ideas to > > > > improve the process. > > > > > > From my understanding of the source code, the process of parse and > > print event will consume a lot of time. For example, the multithreaded > > application will consume 3 CPUs, in a specified time,3 subbuffers will > > be filled and sent to lttng-relayd daemon, recorded into the CTF files. > > If in the specified time, babeltrace only handled 2 subbuffers' event, > > thenthe issue will happens. > > Did you perform a bisection for where the delay come from? Reception of > packet? formatting of event?
? > What is the throughput of the application? ? This is a key element if anyone want to test at home. It can be expressed in event/s, event/ms. > How many tracepoint definition> Does babeltrace catch up if a > quiescent period is given? > > > I think so. > > > Could you provide us with statistics, timing data, etc.? > > What type of delay are we talking about? > > > For example, the system wall time is 5:18 PM now, but the Babeltrace is still > printing events which > have time 5:10 PM. Then stop running the application, the Babeltrace still > need some times to print > the recorded events. > > I think the root cause is that the parsing and printing process is a bit > slow. So i want to know if there > are any method to improve the performance of this process. A simple experiment would be to run a bounded workload to perform a quick timing evaluation of babeltrace with a streamed trace (on disk) and in live mode. Calculate the time (time babeltrace path_to_streamed_trace) it takes babeltrace to read the on disk trace and perform the same experiment with a live trace. It is important to bound your experiment and have mostly the same workload in both experiment. This will give us a rough estimate of the disparity between both scenario. Cheers _______________________________________________ lttng-dev mailing list [email protected] https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
