Re: [PATCH] Allow multiple modules with the same name to coexist in the module cache

Argyrios Kyrtzidis Tue, 01 Apr 2014 18:51:07 -0700

On Apr 1, 2014, at 5:13 PM, Richard Smith <[email protected]> wrote:


> On Tue, Apr 1, 2014 at 4:55 PM, Argyrios Kyrtzidis <[email protected]> 
> wrote:
> 
> On Apr 1, 2014, at 3:32 PM, Richard Smith <[email protected]> wrote:
> 
>> On Tue, Apr 1, 2014 at 12:59 PM, Argyrios Kyrtzidis <[email protected]> 
>> wrote:
>> 
>> On Apr 1, 2014, at 12:03 PM, Richard Smith <[email protected]> wrote:
>> 
>>> On Mon, Mar 31, 2014 at 8:35 PM, Argyrios Kyrtzidis <[email protected]> 
>>> wrote:
>>> 
>>> On Mar 31, 2014, at 6:00 PM, Richard Smith <[email protected]> wrote:
>>> 
>>>> On Mon, Mar 31, 2014 at 5:25 PM, Argyrios Kyrtzidis <[email protected]> 
>>>> wrote:
>>>> 
>>>> On Mar 31, 2014, at 4:44 PM, Richard Smith <[email protected]> wrote:
>>>> 
>>>>> On Mon, Mar 31, 2014 at 4:23 PM, Argyrios Kyrtzidis <[email protected]> 
>>>>> wrote:
>>>>> >
>>>>> > What I'm suggesting is:
>>>>> >
>>>>> > 1) Drop the -I paths that are earlier than the module in the header 
>>>>> > search path when building the module
>>>>> > 2) Include the rest of the header search paths in the configuration 
>>>>> > hash for the module
>>>>> 
>>>>> This will prevent us from sharing system modules across projects, and, as 
>>>>> Ben already mentioned, will result in an explosion of module files, even 
>>>>> within the same project.
>>>>> 
>>>>> Why? System modules would only have system include paths, and these would 
>>>>> usually be the same across projects, right?
>>>> 
>>>> I see what you mean, but we also need to support the case where the system 
>>>> framework developer can point to the framework in his local directory to 
>>>> be used, instead of the system one, how do you suggest we allow this ?
>>>> 
>>>> I'd like for a module to be able to specify its own include paths. If a 
>>>> module does so, the include paths of the host build should not be included 
>>>> in its hash.
>>> 
>>> That didn't quite address my question, let me be more specific. We 
>>> generally need to allow, as an example, system AppKit to import a 
>>> user-local Foundation; neither system AppKit specifying its own include 
>>> paths, nor looking up module dependencies of AppKit only in system include 
>>> paths, allow this.
>>> 
>>> That's a curious use case.
>>> 
>>> It also seems to be compatible with the approach I suggested above, I 
>>> think? You need to put your header search paths in the right order (so the 
>>> lower layers of your system are below the higher layers), but that's 
>>> generally desirable regardless. (We have a limitation that -isystem paths 
>>> are considered after -I paths, which might get in the way here, but that's 
>>> artificial and in conflict with what you're trying to do.)
>> 
>> What is the right order ?
>> 
>> If the order is:
>> 
>>      -I /local/path -isystem /System/Frameworks
>> 
>> and we drop -I paths earlier than the module then module import from AppKit 
>> doesn’t consider -I /local/path.
>> 
>> If the order is:
>> 
>>      -isystem /System/Frameworks -I /local/path
>> 
>> then module import from AppKit finds Foundation in /System/Frameworks and 
>> doesn’t consider -I /local/path.
>> 
>>> 
>>> Alternatively, this seems like a good use case motivating putting the 
>>> include path into the build configuration. The approach in this patch is a 
>>> bad solution for this use case, because the AppKit modules for different 
>>> Foundations will collide in the cache; any of the other solutions we've 
>>> discussed seem to handle this better.
>> 
>> If we don’t drop -I paths earlier than the module then putting the ‘-I’ 
>> paths in the build configuration will cause module explosion.
>> If we drop -I paths earlier than the module then it doesn’t address the 
>> above case and the worse thing is that you don’t anymore have one module 
>> hash cache for the whole translation unit anymore, which creates 
>> complications (e.g. Ben already mentioned the global module index).
>> 
>> We could maybe do something in-between, for example introduce an option for 
>> search paths specific for module lookups. Normally you wouldn’t use them but 
>> if you did use them then they would go in the build configuration and would 
>> be considered first for module lookups. Then we could always do the "drop -I 
>> paths earlier than the module”.
>> 
>>> 
>>>>> Even if we include the search paths in the module hash we will still need 
>>>>> to re-lookup the module dependencies before loading the module, because a 
>>>>> new module.map may have showed up somewhere in the search paths since the 
>>>>> time we built the module; unless I’m missing something, I don’t see any 
>>>>> benefit in including the header search paths in the hash.
>>>>> 
>>>>> The hash should include everything that affects the way the module is 
>>>>> built. If we're transferring include paths from the user of the module to 
>>>>> the module build, the hash should include those paths, or two modules 
>>>>> with different search paths could collide in the cache.
>>>> 
>>>> The collision is avoided by using the path to the module.map as Ben 
>>>> proposes; if different search paths resolve to different module 
>>>> dependencies, this is a case where we need to rebuild, but the same is 
>>>> true if the same paths are used but a different module.map shows up as 
>>>> dependency.
>>>> 
>>>> I don't see how that's sufficient -- the include paths can also affect how 
>>>> non-modular #includes within the module behave.
>>>> 
>>>> As a somewhat-contrived example, suppose I have a library
>>>> 
>>>>   blah/
>>>>     module.map
>>>>     foo.h
>>>>     mode1/
>>>>       foo-impl.h
>>>>     mode2/
>>>>       foo-impl.h
>>>> 
>>>> ... where foo-impl.h is textually-included into foo.h, and this is 
>>>> supposed to be used in two modes: one with a -I path pointing to mode1, 
>>>> and one with a -I path pointing to mode2. We should ensure those two 
>>>> modules don't collide in the cache. (Maybe mode1 and mode2 provide inline 
>>>> assembly for different CPU variants, or such.)
>>> 
>>> 
>>> Non-modular includes are problematic beyond just how to lookup the headers 
>>> during module building. The major problem they introduce is that those same 
>>> non-modular headers can be textually #included in the translation unit, 
>>> before or after the module import that contains the same header contents, 
>>> causing multiple definitions or other semantic issues. That's why we tend 
>>> to prefer the clear semantic model that modules depend only in the headers 
>>> that the module is comprised of, or other modules, and we want to encourage 
>>> people to modularize their headers.
>>> 
>>> First, many non-modular includes are *intended* to be repeatedly textually 
>>> included, and your argument does not apply to those cases (consider Clang's 
>>> .def files, for instance). Some headers are implementation details of other 
>>> headers and are only meant to be included once, into one place; those are 
>>> similarly unaffected.
>> 
>> We already have the syntax in the module map for these; (I think) the syntax 
>> is “exclude <header>”. This is the “escape hatch” to be explicit about what 
>> non-modular, not-part-of-the-module, headers the module will include, and 
>> subsequently it is your responsibility to make sure that <header> is not 
>> textually included in the translation unit as well, or if it is that it will 
>> not cause issues.
>> 
>>> 
>>> Second, the problem you describe is mostly an artifact of our broken 
>>> implementation of macro and declaration visibility across submodules of the 
>>> same top-level module. Names in earlier-built submodule are visible in 
>>> *all* later-built submodules, even those that don't import the earlier ones 
>>> (and @imports in earlier submodules naming later ones have no effect). Once 
>>> we fix that, the requirement to modularize bottom-up (and to "fix" the 
>>> guarded typedefs in your system headers) mostly goes away -- it's generally 
>>> fine to have the same entity declared in two different places, and we'll 
>>> merge together those declarations if they're both visible.
>> 
>> Are you saying that if I have:
>> 
>> struct Foo {
>> <stuff>
>> };
>> 
>> @import Blah
>> 
>> and module ‘Blah’ has a definition for ‘struct Foo’ we will merge them 
>> together ?
>> 
>> It depends on the language.
>> 
>> In C, the two definitions for 'Foo' are compatible types if they have 
>> matching bodies, so they can be used interchangeably in that case and not 
>> otherwise. We don't yet support this, because our compatible type code 
>> doesn't know about modules, and this doesn't come up in non-modules code.
>> 
>> In C++, the two definitions for 'Foo' are required to be equivalent under 
>> the ODR, and we will merge them (and to a limited degree, diagnose them if 
>> they differ). This already basically works.
> 
> The "limited degree” makes me nervous here. We essentially would need to be 
> able to take every definition/declaration in a C/C++/ObjC transition unit and 
> merge them with definitions/declarations from modules, and make sure that we 
> diagnose and error out on differences otherwise we may cause weird/confusing 
> correctness issues.
> 
> The "limited degree" is that we only perform checks on those declarations 
> that we actually deserialize. We can perform more checking, at the cost of 
> being a bit more eager about deserialization, and I plan to add a -f flag for 
> this behavior.
> 
> This seems like a rather significant maintenance burden with lots of edge 
> cases to worry about, and not as important if you have fully modular headers.
> 
> Merging (and ODR checking) is required in C++ whether or not you've got fully 
> modularized headers, because the same template instantiation can end up in 
> multiple modules, and could be different in different places (due to 
> differing declarations present prior to the instantiation). Yes, it's a 
> little messy, but it's actually not so bad, and not much worse than merging 
> redeclaration chains when multiple independent modules choose to declare (but 
> not define) the same entity.

Merging ObjC interfaces/categories/protocols/whatnot was not something we ever 
had to do so far, I’m not so sure about the “not so bad” part..

> 
>>> I think we need to fix this submodule visibility problem regardless (the 
>>> current state of play is extremely fragile to changes in the order we 
>>> happen to build the submodules), so I'm not convinced that this is a 
>>> long-term problem. (Nonetheless, I agree that having modularized 
>>> dependencies is generally a good thing.)
>> 
>> 
> 
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Allow multiple modules with the same name to coexist in the module cache

Reply via email to