Re: GSOC 2018 - Textual LTO dump tool project

Hrishikesh Kulkarni Fri, 02 Mar 2018 01:26:33 -0800

Hello everyone,


Thanks for your suggestions and engaging response.

Based on the feedback I think that the scope of this project comprises of
following three indicative actions:

1. Creating separate driver i.e. separate dump tool that uses lto object
API for reading the lto file.

2. Extending LTO dump infrastructure:

GCC already seems to have dump infrastructure for pretty-printing tree
nodes, gimple statements etc. However I suppose we’d need to extend that
for dumping pass summaries ? For instance, should we add a new hook say
“dump” to ipa_opt_pass_d that’d dump the pass

summary ?

3. Refactoring streaming API - Could you please elaborate more on what
improvements could be made to the streaming API ? Would it be a good idea
to make it more “C++ style” similar to iostream interface ? Also while
going thru ipa-cp/ipa-prop I noticed the following in
ipa_prop_read_functions(), which looks like some kind of “preamble” for
setting up header to read the summary. Perhaps this could be abstracted
into streaming API too ?

const struct lto_function_header *header =

   (const struct lto_function_header *) data;

 const int cfg_offset = sizeof (struct lto_function_header);

 const int main_offset = cfg_offset + header->cfg_size;

 const int string_offset = main_offset + header->main_size;

I would be grateful for suggestions, on how to proceed further, especially
with modifying makefiles for creating the new driver. Unfortunately I have
some school exams next week and won’t be able to work much on GCC during
the period.

Best Regards,

Hrishikesh


On Wed, Feb 28, 2018 at 4:05 PM, Martin Liška <mli...@suse.cz> wrote:

> On 02/25/2018 10:46 AM, Martin Jambor wrote:
> > Hello Hrishikesh,
> >
> > I apologize for replying to you this late, this has been a busy week
> > and now I am traveling.
> >
> > On Mon, Feb 19 2018, Hrishikesh Kulkarni wrote:
> >> Hi,
> >>
> >> I am Hrishikesh Kulkarni currently studying as an undergrad student in
> >> Computer Engineering at Pune University, India. I find compilers quite
> >> interesting as a subject,  and would like to apply to GSoC to gain some
> >> understanding of how real-world compilers work. So far, I have managed
> to
> >> build gcc and perform some simple tweaks to the codebase. In
> particular, I
> >> would like to apply to the Textual LTO dump tool project.
> >>
> >
> > I must say I am impressed by the research you have already done.
> > Nevertheless, please note that Ray Kim has also expressed interest in
> > the project.  Martin Liska will be the mentor, so I will let him drive
> > the selection process.  On the other hand, Ray also liked another
> > project, so maybe he will pick that and everyone will be happy.
>
> Hello.
>
> I'm really happy that there are multiple volunteers that want to work on
> LTO dump
> tool project. According to what I've took a look I would like to have
> Hrishikesh
> working on the project. He's got experience with C, C++ and also with
> Python language
> that can be well used for prototyping. Apart from that he's spent quite
> some time
> with investigation of LTO internals in GCC.
>
> That said, may I please ask other candidates to seek for a different GSoC
> project
> we offered? I believe the other topics are also interesting and important
> for the project.
>
> >
> >> As far as I understand, the motivation for LTO framework was to enable
> >> cross file interprocedural optimizations, and for this purpose an ipa
> pass
> >> is divided into following three stages:
> >>
> >>    1.
> >>
> >>    LGEN: The pass does a local analysis of the function and generates a
> >>    “summary”, ie, the information relevant to the pass and writes it to
> LTO
> >>    object file.
> >
> > A pass might do that, but the output of the whole stage is not just the
> > pass summaries, it also writes the function IL (the function gimple
> > statements, above all) to the object file.
> >
> >>    2.
> >>
> >>    WPA: The LTO object files are given as input to the linker, which
> then
> >>    invokes the lto1 frontend to perform global ipa analysis over the
> >>    call-graph and write optimized summaries to LTO object files
> >>    (partitioning). The global ipa analysis is done over summary and not
> the
> >>    actual function bodies.
> >
> > Well... note that partitioning actually means dividing the whole
> > compiled program/library into chunks that are then compiled
> > independently in the LTRANS stage.  But you are basically right that WPA
> > does also do whole-program analysis based on summaries and then writes
> > its decisions to optimization summaries, yes.
> >
> >>    3.
> >
> >>
> >>    LTRANS: The partitions are read back, and the function bodies are
> >>    reconstructed from summary and are then compiled to produce real
> object
> >>    files.
> >
> > Function bodies and the summaries are distinct things.  The body
> > consists of gimple statements and all the associated stuff (such as
> > types, so a lot of stuff), whereas when we refer to summaries, we mean
> > small chunks of data that interprocedural optimizations such as inlining
> > or IPA-CP scurry away because they cannot feasibly work on bodies of the
> > entire program.
> >
> > But apart from this terminology issue, you are basically correct, at the
> > LTRANS stage, IPA passes apply transformations to the bodies according
> > to the optimization summary generated by the WPA phase.  And then, all
> > normal, intra-procedural passes and code generation runs.
> >
> >>
> >>
> >> If I understand correctly, the motivation for textual LTO dump tool is
> to
> >> easily analyze contents of LTO object file, similar to readelf or
> objdump ?
>
> Yes. Richi in previous email defined how that could be done.
>
> >
> > That is how I understand it too, but Martin may have some further uses
> > in mind.
> >
> >>
> >> Assume that LTO object file contains in pureconst section: 0b0110 (0b
> for
> >> binary prefix) corresponding to values of fs->pure_const_state and
> >> fs->state_previously_known.
> >>
> >> If I understand correctly, the output of dump tool should then be:
> >>
> >> pure_const pass:
> >>
> >> pure_const_state = IPA_PURE (enum value of pure_const_state_e
> corresponding
> >> to 0b01)
> >>
> >> state_previously_known = IPA_NEITHER (enum value of pure_const_state_e
> >> corresponding to 0b10)
> >>
> >> Is this the expected output of the dump tool ?
> >
> > I think the tool would have to a bit more than just dumping summaries of
> > IPA passes.  I tend to think that the task should also include dumping
> > gimple bodies (but we already do that in GCC and so it should be mostly
> > easy) and also of types (that are merged as one of the first steps of
> > WPA and interesting things happen when mergingit does something
> > "interesting").  And perhaps quite a bit more.  Martin?
>
> Yes, as we transitioned to early-debug info in LTO mode, printing tree
> types
> that reside in LTO stream would help us to reduce the stream in the future.
>
> >
> >>
> >> I am reasonably familiar working with C, C++ and python. My prior
> >> experience includes opportunities to work in areas of NLP. Some of my
> >> accomplishments in the area include presenting project VicharDhara- A
> >> thought Mapper that was selected among top five ideas in Accenture
> >> Innovation Challenge among 7000 nationwide entries. My paper on this
> topic
> >> won the best paper award in IEEE Conference ICCUBEA-2017. My previous
> work
> >> was focused on simple parsers, student psychology, thought process
> >> detection for team selection.
> >
> > Interesting, congratulations.
> >
> >>
> >> In the interim, I have been through a few docs on GCC and LTO [1][2][3]
> and
> >> am trying to write a toy ipa pass to better understand LTO/IPA
> >> infrastructure.
> >
> > Great, I believe that's exactly what my advice would be
> >
> >> I would be grateful for feedback on the textual LTO dump
> >> tool.
> >
> > I hope that Martin will shed a bit more light on what output he
> > envisions the tool to have.  I will talk to him about it too when I get
> > back to the office (so maybe on Tuesday but probably on Wednesday).
>
> As mentioned above it was mentioned by Richard. First step would be to
> provide
> write-only mode, where lto-dump will only provide verbose information
> usable
> for debugging.
>
> One another topic is current LTO dumping infrastructure. I know Honza does
> not
> like the interface. Maybe it can be improved in respect to bitpack_d and
> maybe
> some generalization can be done. Honza?
>
> Thanks,
> Martin
>
> >
> > Thanks,
> >
> > Martin
> >
> >
> >
> >>
> >> [1] http://www.ucw.cz/~hubicka/slides/labs2013.pdf
> >>
> >> [2] https://gcc.gnu.org/wiki/LinkTimeOptimizatio
> >> <https://gcc.gnu.org/wiki/LinkTimeOptimization>
> >>
> >> [3] https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html
> >>
> >> My two recent publications are listed below:
> >>
> >> [A] Hrishikesh Kulkarni, "Contextual Data Representation Using Prime
> Number
> >> Route Mapping Method and Ontology" IEEE Conference, ICCUBEA, 2017
> >>
> >> [B] Hrishikesh Kulkarni, “Multi-Graph based Intent Hierarchy Generation
> to
> >> Determine Action Sequence”, Springer Conference, ICDECT, December 2017,
> Pune
> >>
> >> Thanks,
> >>
> >> Hrishikesh Kulkarni
>
>

Re: GSOC 2018 - Textual LTO dump tool project

Reply via email to