Re: [lldb-dev] Inquiry for performance monitors

Pavel Labath via lldb-dev Mon, 01 Feb 2016 02:54:01 -0800

Speaking for Android Studio, I think that we *could* use a
python-based implementation (hard to say exactly without knowing the
details of the implementation), but I believe a different
implementation could be *easier* to integrate. Plus, if the solution
integrates more closely with lldb, we could surface some of the data
in the command-line client as well.


pl

On 1 February 2016 at 10:30, Ravitheja Addepally
<[email protected]> wrote:
> And what about the ease of integration into a an IDE, I don't really know if
> the python based approach would be usable or not in this context ?
>
> On Mon, Feb 1, 2016 at 11:17 AM, Pavel Labath <[email protected]> wrote:
>>
>> It feels to me that the python based approach could run into a dead
>> end fairly quickly: a) you can only access the data when the target is
>> stopped; b) the self-tracing means that the evaluation of these
>> expressions would introduce noise in the data; c) overhead of all the
>> extra packets(?).
>>
>> So, I would be in favor of a lldb-server based approach. I'm not
>> telling you that you shouldn't do that, but I don't think that's an
>> approach I would take...
>>
>> pl
>>
>>
>> On 1 February 2016 at 08:58, Ravitheja Addepally
>> <[email protected]> wrote:
>> > Ok, that is one option, but one of the aim for this activity is to make
>> > the
>> > data available for use by the IDE's like Android Studio or XCode or any
>> > other that may want to display this information in its environment so
>> > keeping that in consideration would the complete python based approach
>> > be
>> > useful ? or would providing LLDB api's to extract raw perf data from the
>> > target be useful ?
>> >
>> > On Thu, Jan 21, 2016 at 10:00 PM, Greg Clayton <[email protected]>
>> > wrote:
>> >>
>> >> One thing to think about is you can actually just run an expression in
>> >> the
>> >> program that is being debugged without needing to change anything in
>> >> the GDB
>> >> remote server. So this can all be done via python commands and would
>> >> require
>> >> no changes to anything. So you can run an expression to enable the
>> >> buffer.
>> >> Since LLDB supports multiple line expression that can define their own
>> >> local
>> >> variables and local types. So the expression could be something like:
>> >>
>> >> int perf_fd = (int)perf_event_open(...);
>> >> struct PerfData
>> >> {
>> >>     void *data;
>> >>     size_t size;
>> >> };
>> >> PerfData result = read_perf_data(perf_fd);
>> >> result
>> >>
>> >>
>> >> The result is then a structure that you can access from your python
>> >> command (it will be a SBValue) and then you can read memory in order to
>> >> get
>> >> the perf data.
>> >>
>> >> You can also split things up into multiple calls where you can run
>> >> perf_event_open() on its own and return the file descriptor:
>> >>
>> >> (int)perf_event_open(...)
>> >>
>> >> This expression will return the file descriptor
>> >>
>> >> Then you could allocate memory via the SBProcess:
>> >>
>> >> (void *)malloc(1024);
>> >>
>> >> The result of this expression will be the buffer that you use...
>> >>
>> >> Then you can read 1024 bytes at a time into this newly created buffer.
>> >>
>> >> So a solution that is completely done in python would be very
>> >> attractive.
>> >>
>> >> Greg
>> >>
>> >>
>> >> > On Jan 21, 2016, at 7:04 AM, Ravitheja Addepally
>> >> > <[email protected]> wrote:
>> >> >
>> >> > Hello,
>> >> >       Regarding the questions in this thread please find the answers
>> >> > ->
>> >> >
>> >> > How are you going to present this information to the user? (I know
>> >> > debugserver can report some performance data... Have you looked into
>> >> > how that works? Do you plan to reuse some parts of that
>> >> > infrastructure?) and How will you get the information from the server
>> >> > to
>> >> > the client?
>> >> >
>> >> >  Currently I plan to show a list of instructions that have been
>> >> > executed
>> >> > so far, I saw the
>> >> > implementation suggested by pavel, the already present infrastructure
>> >> > is
>> >> > a little bit lacking in terms of the needs of the
>> >> > project, but I plan to follow a similar approach, i.e to extract the
>> >> > raw
>> >> > trace data by querying the server (which can use the
>> >> > perf_event_open to get the raw trace data from the kernel) and
>> >> > transport
>> >> > it through gdb packets ( qXfer packets
>> >> >
>> >> >
>> >> > https://sourceware.org/gdb/onlinedocs/gdb/Branch-Trace-Format.html#Branch-Trace-Format).
>> >> > At the client side the raw trace data
>> >> > could be passed on to python based command that could decode the
>> >> > data.
>> >> > This also eliminates the dependency of libipt since LLDB
>> >> > would not decode the data itself.
>> >> >
>> >> > There is also the question of this third party library.  Do we take a
>> >> > hard dependency on libipt (probably a non-starter), or only use it if
>> >> > it's
>> >> > available (much better)?
>> >> >
>> >> > With the above mentioned way LLDB would not need the library, who
>> >> > ever
>> >> > wants to use the python command would have to install it separately
>> >> > but LLDB
>> >> > wont need it
>> >> >
>> >> > With the performance counters, the interface would still be
>> >> > perf_event_open, so if there was a perf_wrapper in LLDB server then
>> >> > it could
>> >> > be reused to configure and use the
>> >> > software performance counters as well, you would just need to pass
>> >> > different attributes in the perf_event_open system call, plus I think
>> >> > the
>> >> > perf_wrapper could be reused to
>> >> > get CoreSight information as well (see
>> >> > https://lwn.net/Articles/664236/
>> >> > )
>> >> >
>> >> >
>> >> > On Wed, Oct 21, 2015 at 8:57 PM, Greg Clayton <[email protected]>
>> >> > wrote:
>> >> > one main benefit to doing this externally is allow this to be done
>> >> > remotely over any debugger connection. If you can run expressions to
>> >> > enable/disable/setup the memory buffer/access the buffer contents,
>> >> > then you
>> >> > don't need to add code into the debugger to actually do this.
>> >> >
>> >> > Greg
>> >> >
>> >> > > On Oct 21, 2015, at 11:54 AM, Greg Clayton <[email protected]>
>> >> > > wrote:
>> >> > >
>> >> > > IMHO the best way to provide this information is to implement
>> >> > > reverse
>> >> > > debugging packets in a GDB server (lldb-server). If you enable this
>> >> > > feature
>> >> > > via some packet to lldb-server, and that enables the gathering of
>> >> > > data that
>> >> > > keeps the last N instructions run by all threads in some buffer
>> >> > > that gets
>> >> > > overwritten. The lldb-server enables it and gives a buffer to the
>> >> > > perf_event_interface(). Then clients can ask the lldb-server to
>> >> > > step back in
>> >> > > any thread. Only when the data is requested do we actually use the
>> >> > > data to
>> >> > > implement the reverse stepping.
>> >> > >
>> >> > > Another way to do this would be to use a python based command that
>> >> > > can
>> >> > > be added to any target that supports this. The plug-in could
>> >> > > install a set
>> >> > > of LLDB commands. To see how to create new lldb command line
>> >> > > commands in
>> >> > > python, see the section named "CREATE A NEW LLDB COMMAND USING A
>> >> > > PYTHON
>> >> > > FUNCTION" on the http://lldb.llvm.org/python-reference.html web
>> >> > > page.
>> >> > >
>> >> > > Then you can have some commands like:
>> >> > >
>> >> > > intel-pt-start
>> >> > > intel-pt-dump
>> >> > > intel-pt-stop
>> >> > >
>> >> > > Each command could have options and arguments as desired. The
>> >> > > "intel-pt-start" command could make an expression call to enable
>> >> > > the feature
>> >> > > in the target by running and expression that runs the some
>> >> > > perf_event_interface calls that would allocate some memory and hand
>> >> > > it to
>> >> > > the Intel PT stuff. The "intel-pt-dump" could just give a raw dump
>> >> > > all of
>> >> > > history for one or more threads (again, add options and arguments
>> >> > > as needed
>> >> > > to this command). The python code could bridge to C and use the
>> >> > > intel
>> >> > > libraries that know how to process the data.
>> >> > >
>> >> > > If this all goes well we can think about building it into LLDB as a
>> >> > > built in command.
>> >> > >
>> >> > >
>> >> > >> On Oct 21, 2015, at 9:50 AM, Zachary Turner via lldb-dev
>> >> > >> <[email protected]> wrote:
>> >> > >>
>> >> > >> There are two different kinds of performance counters: OS
>> >> > >> performance
>> >> > >> counters and CPU performance counters.  It sounds like you're
>> >> > >> talking about
>> >> > >> the latter, but it's worth considering whether this could be
>> >> > >> designed in a
>> >> > >> way to support both (i.e. even if you don't do both yourself, at
>> >> > >> least make
>> >> > >> the machinery reusable and apply to both for when someone else
>> >> > >> wanted to
>> >> > >> come through and add OS perf counters).
>> >> > >>
>> >> > >> There is also the question of this third party library.  Do we
>> >> > >> take a
>> >> > >> hard dependency on libipt (probably a non-starter), or only use it
>> >> > >> if it's
>> >> > >> available (much better)?
>> >> > >>
>> >> > >> As Pavel said, how are you planning to present the information to
>> >> > >> the
>> >> > >> user?  Through some sort of top level command like "perfcount
>> >> > >> instructions_retired"?
>> >> > >>
>> >> > >> On Wed, Oct 21, 2015 at 8:16 AM Pavel Labath via lldb-dev
>> >> > >> <[email protected]> wrote:
>> >> > >> [ Moving this discussion back to the list. I pressed the wrong
>> >> > >> button
>> >> > >> when replying.]
>> >> > >>
>> >> > >> Thanks for the explanation Ravi. It sounds like a very useful
>> >> > >> feature
>> >> > >> indeed. I've found a reference to the debugserver profile data in
>> >> > >> GDBRemoteCommunicationClient.cpp:1276, so maybe that will help
>> >> > >> with
>> >> > >> your investigation. Maybe also someone more knowledgeable can
>> >> > >> explain
>> >> > >> what those A packets are used for (?).
>> >> > >>
>> >> > >>
>> >> > >> On 21 October 2015 at 15:48, Ravitheja Addepally
>> >> > >> <[email protected]> wrote:
>> >> > >>> Hi,
>> >> > >>>   Thanx for your reply, some of the future processors to be
>> >> > >>> released
>> >> > >>> by
>> >> > >>> Intel have this hardware support for recording the instructions
>> >> > >>> that
>> >> > >>> were
>> >> > >>> executed by the processor and this recording process is also
>> >> > >>> quite
>> >> > >>> fast and
>> >> > >>> does not add too much computational load. Now this hardware is
>> >> > >>> made
>> >> > >>> accessible via the perf_event_interface where one could map a
>> >> > >>> region
>> >> > >>> of
>> >> > >>> memory for this purpose by passing it as an argument to this
>> >> > >>> perf_event_interface. The recorded instructions are then written
>> >> > >>> to
>> >> > >>> the
>> >> > >>> memory region assigned. Now this is basically the raw
>> >> > >>> information,
>> >> > >>> which can
>> >> > >>> be obtained from the hardware. It can be interpreted and
>> >> > >>> presented
>> >> > >>> to the
>> >> > >>> user in the following ways ->
>> >> > >>>
>> >> > >>> 1) Instruction history - where the user gets basically a list of
>> >> > >>> all
>> >> > >>> instructions that were executed
>> >> > >>> 2) Function Call History - It is also possible to get a list of
>> >> > >>> all
>> >> > >>> the
>> >> > >>> functions called in the inferior
>> >> > >>> 3) Reverse Debugging with limited information - In GDB this is
>> >> > >>> only
>> >> > >>> the
>> >> > >>> functions executed.
>> >> > >>>
>> >> > >>> This raw information also needs to decoded (even before you can
>> >> > >>> disassemble
>> >> > >>> it ), there is already a library released by Intel called libipt
>> >> > >>> which can
>> >> > >>> do that. At the moment we plan to work with Instruction History.
>> >> > >>> I will look into the debugserver infrastructure and get back to
>> >> > >>> you.
>> >> > >>> I guess
>> >> > >>> for the server client communication we would rely on packets
>> >> > >>> only.
>> >> > >>> In case
>> >> > >>> of concerns about too much data being transferred, we can limit
>> >> > >>> the
>> >> > >>> number
>> >> > >>> of entries we report because anyway the amount of data recorded
>> >> > >>> is
>> >> > >>> too big
>> >> > >>> to present all at once so we would have to resort to something
>> >> > >>> like
>> >> > >>> a
>> >> > >>> viewport.
>> >> > >>>
>> >> > >>> Since a lot of instructions can be recorded this way, the
>> >> > >>> function
>> >> > >>> call
>> >> > >>> history can be quite useful for debugging and especially since it
>> >> > >>> is
>> >> > >>> a lot
>> >> > >>> faster to collect function traces this way.
>> >> > >>>
>> >> > >>> -ravi
>> >> > >>>
>> >> > >>> On Wed, Oct 21, 2015 at 3:14 PM, Pavel Labath <[email protected]>
>> >> > >>> wrote:
>> >> > >>>>
>> >> > >>>> Hi,
>> >> > >>>>
>> >> > >>>> I am not really familiar with the perf_event interface (and I
>> >> > >>>> suspect
>> >> > >>>> others aren't also), so it might help if you explain what kind
>> >> > >>>> of
>> >> > >>>> information do you plan to collect from there.
>> >> > >>>>
>> >> > >>>> As for the PtraceWrapper question, I think that really depends
>> >> > >>>> on
>> >> > >>>> bigger design decisions. My two main questions for a feature
>> >> > >>>> like
>> >> > >>>> this
>> >> > >>>> would be:
>> >> > >>>> - How are you going to present this information to the user? (I
>> >> > >>>> know
>> >> > >>>> debugserver can report some performance data... Have you looked
>> >> > >>>> into
>> >> > >>>> how that works? Do you plan to reuse some parts of that
>> >> > >>>> infrastructure?)
>> >> > >>>> - How will you get the information from the server to the
>> >> > >>>> client?
>> >> > >>>>
>> >> > >>>> pl
>> >> > >>>>
>> >> > >>>>
>> >> > >>>> On 21 October 2015 at 13:41, Ravitheja Addepally via lldb-dev
>> >> > >>>> <[email protected]> wrote:
>> >> > >>>>> Hello,
>> >> > >>>>>       I want to implement support for reading Performance
>> >> > >>>>> measurement
>> >> > >>>>> information using the perf_event_open system calls. The motive
>> >> > >>>>> is
>> >> > >>>>> to add
>> >> > >>>>> support for Intel PT hardware feature, which is available
>> >> > >>>>> through
>> >> > >>>>> the
>> >> > >>>>> perf_event interface. I was thinking of implementing a new
>> >> > >>>>> Wrapper
>> >> > >>>>> like
>> >> > >>>>> PtraceWrapper in NativeProcessLinux files. My query is that, is
>> >> > >>>>> this a
>> >> > >>>>> correct place to start or not ? in case not, could someone
>> >> > >>>>> suggest
>> >> > >>>>> me
>> >> > >>>>> another place to begin with ?
>> >> > >>>>>
>> >> > >>>>> BR,
>> >> > >>>>> A Ravi Theja
>> >> > >>>>>
>> >> > >>>>>
>> >> > >>>>> _______________________________________________
>> >> > >>>>> lldb-dev mailing list
>> >> > >>>>> [email protected]
>> >> > >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>> >> > >>>>>
>> >> > >>>
>> >> > >>>
>> >> > >> _______________________________________________
>> >> > >> lldb-dev mailing list
>> >> > >> [email protected]
>> >> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>> >> > >> _______________________________________________
>> >> > >> lldb-dev mailing list
>> >> > >> [email protected]
>> >> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>> >> > >
>> >> >
>> >> >
>> >>
>> >
>
>
_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] Inquiry for performance monitors

Reply via email to