Re: [PATCH] Add stopgap option -fmodule-implementation-of

Ben Langmuir Wed, 30 Jul 2014 07:02:38 -0700

> On Jul 28, 2014, at 11:56 PM, Richard Smith <[email protected]> wrote:
> 
> On Mon, Jul 28, 2014 at 8:01 PM, Richard Smith <[email protected]> wrote:
> On Mon, Jul 28, 2014 at 6:25 PM, Ben Langmuir <[email protected]> wrote:
> 
>> On Jul 28, 2014, at 5:09 PM, Richard Smith <[email protected]> wrote:
>> 
>> On Mon, Jul 28, 2014 at 2:05 PM, Ben Langmuir <[email protected]> wrote:
>> 
>>> On Jul 24, 2014, at 6:58 PM, Richard Smith <[email protected]> wrote:
>>> 
>>> On Thu, Jul 24, 2014 at 7:56 AM, Ben Langmuir <[email protected]> wrote:
>>> 
>>>> On Jul 16, 2014, at 3:42 PM, Richard Smith <[email protected]> wrote:
>>>> 
>>>> On Fri, Jul 11, 2014 at 8:42 AM, Ben Langmuir <[email protected]> wrote:
>>>> Hey RIchard,
>>>> 
>>>> Sorry to take so long to reply to this, but I am still interested in 
>>>> getting this stopgap into tree.
>>>> 
>>>> Sorry about the delay getting back to you!
>>>>  
>>>>> Please do not add a stopgap workaround to our stable and 
>>>>> backwards-compatible driver interface; just add it to -cc1 instead.
>>>> 
>>>> 
>>>> Sure.
>>>> 
>>>>> I don't see any relation between the flag's name and its functionality; 
>>>>> there seems to be no reason for this to be linked to the translation unit 
>>>>> being the implementation of any particular module (and if there were, 
>>>>> that's what -fmodule-name is for). Instead, I think what you're trying to 
>>>>> specify is that a particular module is included textually for this 
>>>>> compilation. Please pick a name that suggests that functionality instead.
>>>> 
>>>> 
>>>> In the abstract I agree with this, but the use case I have is only for TUs 
>>>> that are implementation files for a module and I know that is the only 
>>>> time that this flag will be used by our tools.  It is more useful for the 
>>>> diagnostic to say “don’t do this in the implementation of module Foo”, 
>>>> since that matches when the build system will be passing in this flag.  
>>>> Given that this doesn’t go into the driver, is this still an issue? If 
>>>> not, I can update and commit this patch, or can post it again for review 
>>>> if you prefer :-)
>>>> 
>>>> I'm fine with this as a short-term cc1-only flag. Longer-term I think we 
>>>> need to evaluate whether we can make the import-of-same-module cases "just 
>>>> work" (I think we can), and I hope this becomes unnecessary at that point.
>>> 
>>> r213767
>>> 
>>>> 
>>>>> >>> What’s unexpected to me is that changing a header whose contents are 
>>>>> >>> not usually visible may still require rebuilding all of my .cpp files.
>>>>> >>> module Foo { module One { header “One.h” } module Two { header 
>>>>> >>> “Two.h” } }
>>>>> >>>
>>>>> >>> // One.cpp - I don’t want to rebuild when Two.h changes
>>>>> >>> #import <Foo/One.h>
>>>>> >>>
>>>>> >>> Do we agree that this is unnecessary if submodules cannot 
>>>>> >>> accidentally be affected by changes in other submodules they don’t 
>>>>> >>> import (and we have some way to get the set of dependency files for 
>>>>> >>> just the submodule)?
>>>>> >>
>>>>> >>
>>>>> >> No, I don't agree with that. One.cpp might inline some function 
>>>>> >> definitions from Two.h, for instance. Or it might fail to build 
>>>>> >> because it declares something that conflicts with something in Two.h.
>>>>> >
>>>>> >
>>>>> > I feel like I”m missing something - how is that different from One.cpp 
>>>>> > having conflicts with some completely different header or module that 
>>>>> > is not imported into that particular TU?
>>>>> 
>>>>> If you import any part of a module, you have the whole module as part of 
>>>>> your translation unit, even though only some of it might be visible. Thus 
>>>>> we will diagnose your declarations that conflict with unimported portions 
>>>>> of an imported module.
>>>>> 
>>>> 
>>>> Maybe we need to have this discussion on cfe-dev at some point.  I think 
>>>> we need a driver flag to control whether clang reports headers from 
>>>> unimported submodules as dependencies, which will allow users/build 
>>>> systems to make the tradeoff.  As for the default, I strongly feel we 
>>>> shouldn't penalize build performance for correct code in order to 
>>>> guarantee that these particular ODR violations get diagnosed in 
>>>> incremental builds.  A full rebuild will still see any diagnostics and the 
>>>> subset of errors that this affects are not being diagnosed today with 
>>>> headers, so we’re still improving.
>>>> 
>>>> Conversely, I think that we should provide a guarantee that incremental 
>>>> and full builds produce bit-for-bit identical results. As you say, it's a 
>>>> tradeoff, but note that this isn't just about ODR violation checking -- 
>>>> the incremental approach you're suggesting can generate wrong code in some 
>>>> cases (we can inline a function definition from the old version of Two.h) 
>>>> -- so if we want to support this partial-rebuild mode, we'll need to be 
>>>> /very/ careful that we don't pull in any information from an unimported 
>>>> submodule in that mode.
>>> 
>>> Maybe you can help me understand how this would come about.  In our 
>>> documentation we say:
>>> 
>>>> Modules are modeled as if each submodule were a separate translation unit, 
>>>> and a module import makes names from the other translation unit visible
>>> 
>>> 
>>> Here’s my understanding:
>>> If I don’t import the submodule containing “Two.h”, then I shouldn’t get 
>>> its definitions in my TU.
>>> 
>>> You get its definitions in your *program*. If you import any part of a 
>>> module, the entire module is part of your program. Example:
>> 
>> Okay, but that’s just more consistency checking, ins’t it?  If I import 
>> Module1.B, but not Module1.A (or Module2.C) I don’t want to see “f” in my 
>> exported symbols.
>> 
>> I think you're saying that it would in principle be possible for us to 
>> accept the example I gave? It probably would, but the fact that we reject it 
>> right now is a feature, not a bug.
> 
> Agreed, although I think we weigh its benefit vs incremental building 
> differently.
> 
>>> Module1.A:
>>> int f(int);
>>> 
>>> Module1.B:
>>> extern int n;
>>> 
>>> Module2.C:
>>> import Module1.B;
>>> void f(int); // error, conflicting return type
>>> 
>>> If I have an inline declaration for a function in Two, then I still need to 
>>> have a definition in my own TU because of inline.  If I have a non-inline 
>>> decl, then Two can’t have an inline decl and if it has a definition for the 
>>> function not marked inline then having that definition show up in my TU 
>>> would lead to multiple definitions if Two is imported somewhere else.
>>> 
>>> You can get into this situation with C++ templates. You might only be able 
>>> to see a declaration of a template, where another submodule provides a 
>>> definition that is hidden but still available for inlining. This doesn't 
>>> violate any language rule as long as there's an explicit instantiation of 
>>> the template somewhere.
>> 
>> If I don’t see a definition in my TU, how can I use the template in a way 
>> affected by inlining?
>> 
>> You do "see" a definition in your TU, for some value of "see". That 
>> definition *is* imported, and is known about by the compiler; we just give 
>> you an error if you try to use it. CodeGen is still able to emit it. This is 
>> necessary to support entities that are imported by a module but not 
>> re-exported.
>> 
>> Consider this:
>> 
>> Module X:
>>   inline int f() { return 0; }
>> Module Y:
>>   import X; // not re-exported
>>   inline int g() { return f(); }
>> Z.cc:
>>   import Y;
>>   int k = g();
>> 
>> In Z.cc, we are *required* to emit the body of 'f', even though you can't 
>> "see" it.
> 
> Okay, that makes sense.  This is certainly something we would need to account 
> for to do safe incremental rebuilding.  I think the right answer is to make 
> sure that the transitive imports get included in the reported dependencies 
> regardless of being re-exported.
> 
>> And entities in X are treated just like entities in an unimported submodule 
>> of Y.
> 
> Ah.  This seems like an accident of the implementation rather than a 
> desirable property.  We have two distinct cases:
> 
> 1) A imports B, and B is not re-exported.  B’s headers are still dependencies 
> for our TU even though they aren’t  visible.
> 2) A has submodules B and C.  Importing A.B does not create a dependency on 
> A.C or vice versa.
> 
> I think you mean "should not" rather than "does not" here: under the current 
> implementation, it certainly does, in that the contents of A.C can affect 
> whether a user of A.B builds today.


Right.

> Even then (as you note above) we have a trade-off here; there are benefits to 
> having that dependency.
> 
>> I may not have an instantiation of a template, but I still need to see its 
>> definition.  If its definition changes, that would require rebuilding the 
>> other TU that has the instantiation.  I’m probably being thick, but I still 
>> don’t see the issue here.
>> 
>> 
>>> You can also get into this situation with the C99 inline rules, where you 
>>> don't have to define an 'inline' function in every translation unit.
>> 
>> Did this change in C11, or am I misreading this?
>> 6.7.4.7: For a function with external linkage, the following restrictions 
>> apply: If a function is declared with an inline function specifier, then it 
>> shall also be defined in the same translation unit.
>> 
>> That rule applies only if the function is declared with the 'inline' 
>> specifier in that translation unit. Example:
>> 
>> Module X.A:
>>   extern int f(void); // ok, no 'inline', no definition required in this TU
>> Module X.B:
>>   inline int f() { return 0; } // ok, definition
>> main.cc:
>>   import X.A;
>>   int main() { return f(); }
>> 
>> In this setup, f() might get inlined into main, even though the definition 
>> is not visible. (FWIW, I expect we'll also generate wrong code in this case, 
>> because we'll emit a strong definition of 'f' from every TU that imports X; 
>> conversely, if X.A and X.B are split into separate top-level modules, then a 
>> TU that imports both will not emit a strong definition of 'f’.)
> 
> I don’t think this is a good idea at all.  I’m okay with saying that you’re 
> not allowed to have conflicting submodules, but having them create implicit 
> dependencies like this violates my mental model for semantic import.  I would 
> much prefer that X.A and X.B behave the same as top-level modules (except 
> that importing X might implicitly pull in A and/or B), and I think that would 
> be much less surprising.
> 
> I used to think the same thing, but I don't any more. I think there is value 
> in being able to say that a collection of submodules together forms some 
> coherent, logically-indivisible whole (call it a "library", maybe?), where 
> the submodules just provide visibility control over the pieces of that 
> library. Right now, we also couple that to two other things: the identity of 
> the "library", and the .pcm file structure, are both determined by the 
> top-level module name. I'm not convinced that's a good idea -- there are 
> certainly cases where it makes sense to have more granularity than that.
> 
> If we could decouple this "same library" / "same .pcm file" decision from the 
> top-level module name, so that you could say "X.A and X.B are notionally 
> separate (and live in distinct libraries / .pcm files)", would that address 
> your concern?
> 
> I asked something more specific than what I really wanted to know here. In 
> Clang's current implementation, the top-level module that contains a given 
> module affects a lot of things. In your X.A / X.B example, which properties 
> do you want? Off the top of my head:
> 
>  1. X.A and X.B are placed into the same .pcm file
>   1a. That .pcm file doesn't contain any other top-level module
>  2. X.A and X.B are both part of any TU / program that uses either of them
>  3. X.A and X.B have names starting with the same prefix
>  4. X.A and X.B are notionally in the same "layer", so there's no need to 
> think about dependency cycles with other modules

I’m not sure what (4) means.

>  5. X.A and X.B are always built together
> 

In my mind, a submodule should have a special relationship with its parent and 
its children, but not with its siblings.  Importing a module may imply 
importing its children, but importing X.A does not imply importing X.B (unless 
X.A transitively had such an import statement in it).

So (2) would specifically not be desired and (1) and (5) would be 
implementation details as long as we are producing modules on-demand.  Now to 
support explicitly generating pcms,  I think allowing submodules to be built in 
separate pcm files would be useful.

That being said, I think saying that the contents of all available submodules 
of a common ancestor should be compatible (no diagnostic required) would be 
fine, if that helps us diagnose more problems.



> (FWIW, I don't think it makes sense for all these things to be tied to the 
> choice of top-level module name.)
> 
> Another point that seems relevant is that implicit module builds are a bad 
> idea in a lot of situations. They don't distribute well, they rely on 
> side-channels for sharing module files, they break existing build system 
> assumptions, they require multiple compile actions to block waiting for each 
> other, and so on. A better approach, which we should be encouraging people to 
> use, is to make the module build step explicit in the build system. Once we 
> treat "building a module" as a build step with its own dependencies (which is 
> in turn depended on by downstream .cpp and module builds), this incremental 
> rebuild approach becomes rather problematic.
> 
> Finally, a point I've raised before is that hermetic builds are important to 
> a lot of people: for build reproducibility, cacheability, and so on, it's 
> important that your build does *not* depend on the path of builds you did 
> previously.
> 
> Both these points would be addressed by splitting your X.A and X.B builds up 
> so they built separate .pcm files.
> 
> What happens when I provide an incompatible external definiton of “f()” in 
> another TU?  We can’t diagnose the conflict
> 
> There is no conflict; the C standard says that the implementation gets to 
> pick whichever one it likes.
> 
> Eventually, I'd like for us to include some IR (representing inline function 
> definitions and so on) in the module file, to remove the cost of repeatedly 
> generating IR for inline functions within modules. I don't think we want the 
> complexity of segregating that IR on the basis of frontend name visibility 
> rules.
> 
> and we will be calling the inline definition from a module we didn’t import 
> (from the user’s perspective).  Seems at least as bad as the other conflicts 
> we’ve talked about :-)
> 
> If you actually want the inlining, just make the inline definition visible, 
> or turn on LTO.
> 
> Conversely, if you actually want separate entities from a dependency point of 
> view, just make different module files for them.
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Add stopgap option -fmodule-implementation-of

Reply via email to