Re: [PATCH] Allow multiple modules with the same name to coexist in the module cache

Argyrios Kyrtzidis Tue, 01 Apr 2014 17:03:12 -0700

On Apr 1, 2014, at 3:32 PM, Richard Smith <[email protected]> wrote:


> On Tue, Apr 1, 2014 at 12:59 PM, Argyrios Kyrtzidis <[email protected]> 
> wrote:
> 
> On Apr 1, 2014, at 12:03 PM, Richard Smith <[email protected]> wrote:
> 
>> On Mon, Mar 31, 2014 at 8:35 PM, Argyrios Kyrtzidis <[email protected]> 
>> wrote:
>> 
>> On Mar 31, 2014, at 6:00 PM, Richard Smith <[email protected]> wrote:
>> 
>>> On Mon, Mar 31, 2014 at 5:25 PM, Argyrios Kyrtzidis <[email protected]> 
>>> wrote:
>>> 
>>> On Mar 31, 2014, at 4:44 PM, Richard Smith <[email protected]> wrote:
>>> 
>>>> On Mon, Mar 31, 2014 at 4:23 PM, Argyrios Kyrtzidis <[email protected]> 
>>>> wrote:
>>>> >
>>>> > What I'm suggesting is:
>>>> >
>>>> > 1) Drop the -I paths that are earlier than the module in the header 
>>>> > search path when building the module
>>>> > 2) Include the rest of the header search paths in the configuration hash 
>>>> > for the module
>>>> 
>>>> This will prevent us from sharing system modules across projects, and, as 
>>>> Ben already mentioned, will result in an explosion of module files, even 
>>>> within the same project.
>>>> 
>>>> Why? System modules would only have system include paths, and these would 
>>>> usually be the same across projects, right?
>>> 
>>> I see what you mean, but we also need to support the case where the system 
>>> framework developer can point to the framework in his local directory to be 
>>> used, instead of the system one, how do you suggest we allow this ?
>>> 
>>> I'd like for a module to be able to specify its own include paths. If a 
>>> module does so, the include paths of the host build should not be included 
>>> in its hash.
>> 
>> That didn't quite address my question, let me be more specific. We generally 
>> need to allow, as an example, system AppKit to import a user-local 
>> Foundation; neither system AppKit specifying its own include paths, nor 
>> looking up module dependencies of AppKit only in system include paths, allow 
>> this.
>> 
>> That's a curious use case.
>> 
>> It also seems to be compatible with the approach I suggested above, I think? 
>> You need to put your header search paths in the right order (so the lower 
>> layers of your system are below the higher layers), but that's generally 
>> desirable regardless. (We have a limitation that -isystem paths are 
>> considered after -I paths, which might get in the way here, but that's 
>> artificial and in conflict with what you're trying to do.)
> 
> What is the right order ?
> 
> If the order is:
> 
>       -I /local/path -isystem /System/Frameworks
> 
> and we drop -I paths earlier than the module then module import from AppKit 
> doesn’t consider -I /local/path.
> 
> If the order is:
> 
>       -isystem /System/Frameworks -I /local/path
> 
> then module import from AppKit finds Foundation in /System/Frameworks and 
> doesn’t consider -I /local/path.
> 
>> 
>> Alternatively, this seems like a good use case motivating putting the 
>> include path into the build configuration. The approach in this patch is a 
>> bad solution for this use case, because the AppKit modules for different 
>> Foundations will collide in the cache; any of the other solutions we've 
>> discussed seem to handle this better.
> 
> If we don’t drop -I paths earlier than the module then putting the ‘-I’ paths 
> in the build configuration will cause module explosion.
> If we drop -I paths earlier than the module then it doesn’t address the above 
> case and the worse thing is that you don’t anymore have one module hash cache 
> for the whole translation unit anymore, which creates complications (e.g. Ben 
> already mentioned the global module index).
> 
> We could maybe do something in-between, for example introduce an option for 
> search paths specific for module lookups. Normally you wouldn’t use them but 
> if you did use them then they would go in the build configuration and would 
> be considered first for module lookups. Then we could always do the "drop -I 
> paths earlier than the module”.
> 
>> 
>>>> Even if we include the search paths in the module hash we will still need 
>>>> to re-lookup the module dependencies before loading the module, because a 
>>>> new module.map may have showed up somewhere in the search paths since the 
>>>> time we built the module; unless I’m missing something, I don’t see any 
>>>> benefit in including the header search paths in the hash.
>>>> 
>>>> The hash should include everything that affects the way the module is 
>>>> built. If we're transferring include paths from the user of the module to 
>>>> the module build, the hash should include those paths, or two modules with 
>>>> different search paths could collide in the cache.
>>> 
>>> The collision is avoided by using the path to the module.map as Ben 
>>> proposes; if different search paths resolve to different module 
>>> dependencies, this is a case where we need to rebuild, but the same is true 
>>> if the same paths are used but a different module.map shows up as 
>>> dependency.
>>> 
>>> I don't see how that's sufficient -- the include paths can also affect how 
>>> non-modular #includes within the module behave.
>>> 
>>> As a somewhat-contrived example, suppose I have a library
>>> 
>>>   blah/
>>>     module.map
>>>     foo.h
>>>     mode1/
>>>       foo-impl.h
>>>     mode2/
>>>       foo-impl.h
>>> 
>>> ... where foo-impl.h is textually-included into foo.h, and this is supposed 
>>> to be used in two modes: one with a -I path pointing to mode1, and one with 
>>> a -I path pointing to mode2. We should ensure those two modules don't 
>>> collide in the cache. (Maybe mode1 and mode2 provide inline assembly for 
>>> different CPU variants, or such.)
>> 
>> 
>> Non-modular includes are problematic beyond just how to lookup the headers 
>> during module building. The major problem they introduce is that those same 
>> non-modular headers can be textually #included in the translation unit, 
>> before or after the module import that contains the same header contents, 
>> causing multiple definitions or other semantic issues. That's why we tend to 
>> prefer the clear semantic model that modules depend only in the headers that 
>> the module is comprised of, or other modules, and we want to encourage 
>> people to modularize their headers.
>> 
>> First, many non-modular includes are *intended* to be repeatedly textually 
>> included, and your argument does not apply to those cases (consider Clang's 
>> .def files, for instance). Some headers are implementation details of other 
>> headers and are only meant to be included once, into one place; those are 
>> similarly unaffected.
> 
> We already have the syntax in the module map for these; (I think) the syntax 
> is “exclude <header>”. This is the “escape hatch” to be explicit about what 
> non-modular, not-part-of-the-module, headers the module will include, and 
> subsequently it is your responsibility to make sure that <header> is not 
> textually included in the translation unit as well, or if it is that it will 
> not cause issues.
> 
>> 
>> Second, the problem you describe is mostly an artifact of our broken 
>> implementation of macro and declaration visibility across submodules of the 
>> same top-level module. Names in earlier-built submodule are visible in *all* 
>> later-built submodules, even those that don't import the earlier ones (and 
>> @imports in earlier submodules naming later ones have no effect). Once we 
>> fix that, the requirement to modularize bottom-up (and to "fix" the guarded 
>> typedefs in your system headers) mostly goes away -- it's generally fine to 
>> have the same entity declared in two different places, and we'll merge 
>> together those declarations if they're both visible.
> 
> Are you saying that if I have:
> 
> struct Foo {
> <stuff>
> };
> 
> @import Blah
> 
> and module ‘Blah’ has a definition for ‘struct Foo’ we will merge them 
> together ?
> 
> It depends on the language.
> 
> In C, the two definitions for 'Foo' are compatible types if they have 
> matching bodies, so they can be used interchangeably in that case and not 
> otherwise. We don't yet support this, because our compatible type code 
> doesn't know about modules, and this doesn't come up in non-modules code.
> 
> In C++, the two definitions for 'Foo' are required to be equivalent under the 
> ODR, and we will merge them (and to a limited degree, diagnose them if they 
> differ). This already basically works.

The "limited degree” makes me nervous here. We essentially would need to be 
able to take every definition/declaration in a C/C++/ObjC transition unit and 
merge them with definitions/declarations from modules, and make sure that we 
diagnose and error out on differences otherwise we may cause weird/confusing 
correctness issues.
This seems like a rather significant maintenance burden with lots of edge cases 
to worry about, and not as important if you have fully modular headers.

> 
>> I think we need to fix this submodule visibility problem regardless (the 
>> current state of play is extremely fragile to changes in the order we happen 
>> to build the submodules), so I'm not convinced that this is a long-term 
>> problem. (Nonetheless, I agree that having modularized dependencies is 
>> generally a good thing.)
> 
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Allow multiple modules with the same name to coexist in the module cache

Reply via email to