[PATCH] D83088: Introduce CfgTraits abstraction

Nicolai Hähnle via Phabricator via cfe-commits Sun, 23 Aug 2020 23:58:42 -0700

nhaehnle added a comment.

>>>> The most immediate problem is divergence analysis, which is extremely 
>>>> complex and difficult to get right. If I had tried to fight the accidental 
>>>> complexity that comes with attempting to write such an algorithm as C++ 
>>>> templates in addition to the inherent complexity of the algorithm at the 
>>>> same time, I'm not sure I would have been able to produce anything 
>>>> workable at all.
>>>>
>>>> Frankly, I suspect that our dominator tree implementation also suffer 
>>>> because of this, though at least dominator trees are much more well 
>>>> studied in the academic literature, so that helps keep the inherent 
>>>> complexity under control.
>>>
>>> I'm totally open to discussing making APIs more usable, for sure - though 
>>> I'm thinking it's likely a concept (like containers in the C++ standard 
>>> library) might be the better direction.
>>>
>>> Perhaps some code samples showing how one would interact (probably not 
>>> whole algorithms - maybe something simple like generating a dot diagram for 
>>> a graph) with these things given different APIs (traits, concepts, and 
>>> runtime polymorphism) - and implementations of each kind too.
>>
>> Take a look here for example: 
>> https://github.com/nhaehnle/llvm-project/blob/715450fa7f968ceefaf9c3b04b47066866c97206/llvm/lib/Analysis/GenericConvergenceUtils.cpp#L499
>>  -- this is obviously still fairly simple, but it's an example of printing 
>> out the results of an analysis in a way that's generic over the underlying 
>> CFG and SSA form.
>
> I'm having trouble following this example - I'm not sure what the CfgPrinter 
> abstraction is/why it's first-class, and why this "print" function is calling 
> what look like mutation operations like "appendBlocks". I guess perhaps the 
> question is - what's it printing from and what's it printing to?
>
> Ah, I see, the "append" functions are accessors, of a sort. Returning a 
> container might be more clear than using an out parameter - alternatively, a 
> functor parameter (ala std::for_each) that is called for each element, that 
> can then be used to populate an existing container if desired, or to do 
> immediate processing without the need for an intermediate container.


The code is trying to strike a balance here in terms of performance. Since 
dynamic polymorphism is used, a functor-based traversal can't be inlined and so 
the number of indirect function calls increases quite a bit. There are a number 
of use cases where you really do want to just append successors or predecessors 
to a vectors, like during a graph traversal. An example graph traversal is 
here: 
https://github.com/nhaehnle/llvm-project/blob/controlflow-wip-v7/llvm/lib/Analysis/GenericConvergenceUtils.cpp#L329

> Though the printer abstraction still strikes me as a bit strange - especially 
> since it doesn't seem to be printing itself. This function was passed a 
> printer and a stream - the printer prints to the stream (perhaps it'd make 
> more sense for the printer to take the stream on construction) and the 
> function isn't passed the thing to print at all - that thing is accessed from 
> the printer. That seems fairly awkward to me - I'd expect a printing 
> operation to take a thing to be printed and a thing to print to.

Keep in mind that where those `printBlockName` / `printValue` methods are used, 
they will always be mixed in with printing of other things. So the alternative 
you're describing would result in code like:

  dbgs() << " currently at: ";  // explicit stream
  printer.printBlockName(block); // implicit stream
  dbgs() << '\n'; // explicit stream

I would argue that that ends up being more awkward because it mixes implicit 
streams with explicitly given streams.

We could perhaps add versions of the methods that return a `Printable` object, 
so that we can write:

  dbgs() << "currently at: " << printer.printableBlockName(block) << '\n';

By the way, the main motivation for making the CfgPrinter a separate object is 
that printing LLVM IR efficiently requires keeping a ModuleSlotTracker around. 
Splitting the CfgPrinter off from the CfgInterface allows the fast-path of code 
that doesn't want debug prints to not have to carry a reference to a 
ModuleSlotTracker around, even if it's just an empty std::unique_ptr. That's a 
comparatively small burden though, so I'd be fine with merging the CfgPrinter 
back into the CfgInterface.

> Perhaps setting aside the complexities of printing things - could you provide 
> an example of code, given a CfgGraph, that walks the graph - perhaps just 
> numbering the nodes/edges/etc to produce a dot graph? Showing what the code 
> would look like if it were passed a GraphTraits-implementing graph, a static 
> polymorphic CfgGraph, and a dynamically polymorphic GfgGraph - and also 
> showing what would be fandemantally possible with the CfgGraph that wouldn't 
> be possible with GraphTraits, if any such things exist (it's still unclear to 
> me whether CfgGraph has abstractions that don't exist in GraphTraits (eg: 
> could you write a CfgGraph over GraphTraits? or would that be impossible 
> because GraphTraits is missing concepts/CfgGraph doesn't apply to all 
> GraphTraits-grahs? what subset of GraphTraits graphs does CfgGraph cover?).

Repeating myself, the main difference is that CfgGraph has a notion of SSA 
values in addition to nodes / basic blocks. There are some other differences, 
but they're comparatively minor and more cosmetic.

As far as an example is concerned, see the link above. That particular 
algorithm is still on the simpler side and doesn't need SSA values, so you 
could implement it based on GraphTraits as well, in which case you'd replace 
the various

  iface.appendSuccessors(block, blockStack);

with something like:

  blockStack.insert(blockStack.end(), std::begin(llvm::children(block)), 
std::end(llvm::children(block)));

Apart from that, the code would be the same.

>> A statically polymorphic wrapper is here: 
>> https://github.com/nhaehnle/llvm-project/blob/715450fa7f968ceefaf9c3b04b47066866c97206/llvm/include/llvm/Analysis/GenericConvergenceUtils.h#L569
>>
>> The simple example might be bearable writing as a template, precisely 
>> because it's simple -- so only looking at simple examples is unlikely to 
>> really capture the motivation. Really what the motivation boils down to is 
>> stuff like this: 
>> https://github.com/nhaehnle/llvm-project/blob/controlflow-wip-v7/llvm/lib/Analysis/GenericUniformAnalysis.cpp
>>  -- I don't fancy writing all this as a template.
>>
>> Third motivation would essentially go away if C++ could type-check against 
>> traits in the way that Rust and other languages like it can -- but it can't, 
>> so here we are.
>
> I hesitate to write code that's more idiomatic in a language that isn't C++. 
> Agreed that long/complicated algorithms as templates aren't the best thing - 
> but sometimes can be quite suitable/idiomatic C++ (see the C++ standard 
> library).

I think there's a misunderstanding here. The code I'm writing based on 
CfgTraits would not be more idiomatic in Rust or any other language that I'm 
aware of. The point is that writing the code in the way that it seems you want 
it to be written would be feasible in Rust, but isn't feasible in C++. It's 
about how type-checking of generics/templates works in those languages.

Comparing to the C++ STL, the STL is significantly simpler on the aspect that 
matters here. The templates in the STL have comparatively simple contracts with 
their template parameters. In most cases, they only care about a very small 
number of things, like copy-/move-ability and comparison operators. The 
"interface" of the CfgTraits is broader and deeper. Repeating myself again, 
this is in large part because it also has a notion of SSA values, but there are 
other reasons, e.g. caring about printability to make debugging less painful.

> That said, I'd like to help make things more usable, for sure - but I'm not 
> sure/currently feeling like this might not be the best direction for 
> achieving that goal & I think some clear comparisons - even for overly 
> simplistic code, where the overhead of a more complex solution may not be 
> felt as acutely (actually, small examples might show syntactic overhead more 
> acutely - if it takes more code to do the same thing, when that code isn't 
> washed out by a lot of code that would be the same regardless of 
> implementation, it will hopefully be more obvious, rather than less), 
> hopefully it'll be more a more clear/concrete basis on which to discuss 
> relative design tradeoffs.

It's not really about needing to write more or less source code -- see the 
example above. It's about the fact that C++ can't type-check templates in a 
useful way. Templates are effectively duck-typed and type-checking only happens 
at instantiation, which makes their maintenance a pain over time. A nice 
example of the negative effects that this can have in practice is you end up 
with stuff like (from mlir/include/mlir/IR/Block.h):

  /// Print out the name of the block without printing its body.
  /// NOTE: The printType argument is ignored.  We keep it for compatibility
  /// with LLVM dominator machinery that expects it to exist.
  void printAsOperand(raw_ostream &os, bool printType = true);

Why is this method called `printAsOperand`, which makes no sense for a generic 
interface of a graph? And why does it have a `printType` argument? Presumably 
this happened because dominator trees were first written for IR and then 
genericized from there without taking the time to properly think about and 
define what the interface between generic dominator trees and their template 
parameter should be. So then an implementation detail of llvm::BasicBlock ended 
up leaking all over the place.

When I started out writing the algorithms around `CfgTraits`, I didn't even 
know what the right interface should be, and I expect that there may well still 
be tweaks to the interface going forward. Writing the code in the way that I'm 
writing it helps keeps us honest and prevents weird escapes like 
`printAsOperand`. Perhaps even more importantly, as @kuhar also suggested, the 
quality-of-life when writing and maintaining code is much improved because you 
get more useful type-checking.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83088/new/

https://reviews.llvm.org/D83088

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D83088: Introduce CfgTraits abstraction

Reply via email to