Re: [gradle-dev] producing multiple outputs from jvm languages

Luke Daley Mon, 28 Jan 2013 14:48:16 -0800

On 28/01/2013, at 10:45 PM, Adam Murdoch <[email protected]> wrote:


> 
> On 28/01/2013, at 10:30 PM, Luke Daley wrote:
> 
>> 
>> On 24/01/2013, at 2:36 AM, Adam Murdoch <[email protected]> wrote:
>> 
>>> 
>>> On 23/01/2013, at 8:43 PM, Luke Daley wrote:
>>> 
>>>> 
>>>> On 21/01/2013, at 12:24 AM, Adam Murdoch <[email protected]> 
>>>> wrote:
>>>> 
>>>>> 
>>>>> On 21/01/2013, at 8:51 AM, Adam Murdoch wrote:
>>>>> 
>>>>>> 
>>>>>> On 17/01/2013, at 12:10 PM, Adam Murdoch wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Another question is how to group source sets and packagings.
>>>>>>> 
>>>>>>> For source sets, we currently use two approaches:
>>>>>>> 
>>>>>>> - For Java, Scala, Groovy, ANTRL and resources, we use the functional 
>>>>>>> source sets, adding the source to `sourceSets.${function}.${language}`.
>>>>>>> - For C++, we use language specific source sets, adding the source to 
>>>>>>> `cpp.sourceSets.${function}.source` and 
>>>>>>> `cpp.sourceSets.${function}.exportedHeaders`.
>>>>>>> 
>>>>>>> I'd like to come up with a consistent pattern here, which can allow 
>>>>>>> arbitrary groupings of source files by language and by function. I can 
>>>>>>> see three options. For all these options, assume that all source sets 
>>>>>>> are composable to some degree, so, for example, you can add a given 
>>>>>>> Java source set to another Java source set, or you can add a given Java 
>>>>>>> source set to a composite source set.
>>>>>>> 
>>>>>>> 1. Java plugin style, where the primary grouping is functional: 
>>>>>>> `sourceSets.${function}.${language}`:
>>>>>>> 
>>>>>>>         sourceSets {
>>>>>>>                 main {
>>>>>>>                         cpp { srcDirs = '…' }
>>>>>>>                         cppHeaders { srcDirs = '…' }
>>>>>>>                         javaScript { srcDirs = '…' }
>>>>>>>                 }
>>>>>>>         }
>>>>>>> 
>>>>>>> 2. C++ plugin style, where the primary grouping is by language: 
>>>>>>> `${language}.sourceSets.${function}`
>>>>>>> 
>>>>>>>         java { 
>>>>>>>                 sourceSets {
>>>>>>>                         main { srcDirs = '…' }
>>>>>>>                 }
>>>>>>>         }
>>>>>>>         groovy {
>>>>>>>                 sourceSets {
>>>>>>>                         main { srcDirs = '…' }
>>>>>>>                 }
>>>>>>>         }
>>>>>>>         resources {
>>>>>>>                 sourceSet {
>>>>>>>                         main { srcDirs = '…' }
>>>>>>>                 }
>>>>>>>         }
>>>>>>>         javaScript {
>>>>>>>                 sourceSets {
>>>>>>>                         main { srcDirs = '…' }
>>>>>>>                 }
>>>>>>>         }
>>>>>>> 
>>>>>>> 3. Both, where defining `sourceSets.${function}.${language}` also 
>>>>>>> defines `${language}.sourceSets.${function}` and vice versa.
>>>>>>> 
>>>>>>> 4. A polymorphic container of source sets. You use whatever groupings 
>>>>>>> you like, and can add language specific or composite source sets to 
>>>>>>> this container. The opinionated language plugins would continue to 
>>>>>>> group by function and add a `main` and `test` composite source set.
>>>>>>> 
>>>>>>>         sourceSets {
>>>>>>>                 main(CompositeSourceSet) {  // possibly the default type
>>>>>>>                         java { srcDirs = '…' }
>>>>>>>                         cpp { srcDirs = '…' }
>>>>>>>                         resources { srcDirs = '…' }
>>>>>>>                 }
>>>>>>>                 test(GroovySourceSet) {
>>>>>>>                         srcDirs = '...'
>>>>>>>                 }
>>>>>>>         }
>>>>>>> 
>>>>>> 
>>>>>> My preference at this stage is to go with option #1. Let's dig into this 
>>>>>> a bit more.
>>>>>> 
>>>>>> The goal, regardless of whichever grouping option we choose, is to 
>>>>>> introduce language specific source sets as a layer underneath this. So, 
>>>>>> we'd add types like JavaSourceSet, GroovySourceSet, CppSourceSet, and so 
>>>>>> on. These types would have some stuff common - mainly just a set of 
>>>>>> source files - and some meta-data about the source files:
>>>>>> 
>>>>>> - For a Java source set, this would include the Java language level and 
>>>>>> Java API that the source is written against, and the compile and runtime 
>>>>>> dependencies of the source.
>>>>>> - Same for Groovy and Scala source sets, except with the Groovy and 
>>>>>> Scala languages and runtimes. The Java API is also relevant for this 
>>>>>> source, I guess. These source sets also have some way to declare the 
>>>>>> compiler macros/AST transforms that the source expects to be available, 
>>>>>> probably as a set of dependencies on the implementation libraries.
>>>>>> - For a C/C++ source set, this would include the language dialect that 
>>>>>> the source is written against, and the compile, link and runtime 
>>>>>> dependencies of the source.
>>>>>> - For an ANTLR source set, this would include the ANTLR language version 
>>>>>> that the grammars are written to.
>>>>>> - For a Javascript source set, this would include the runtime 
>>>>>> dependencies of the source.
>>>>>> 
>>>>>> One question is how to model source sets that are related to each other 
>>>>>> at the language level:
>>>>>> 
>>>>>> - C/C++ source files and their public headers and private headers.
>>>>>> - Java (Scala/Groovy) source and their resource source files.
>>>>>> - Jointly compiled Java/Scala/Groovy source files.
>>>>>> 
>>>>>> Currently, the C++ plugin groups the C++ source files and headers into a 
>>>>>> CppSourceSet, with separate SourceDirectorySets for the source and for 
>>>>>> the headers. The Jvm language plugins group the Java/Scala/Groovy and 
>>>>>> resources into a SourceSet, with separate SourceDirectorySets for each 
>>>>>> language. So, these plugins are effectively grouping the source based on 
>>>>>> the target platform, or by target output component, depending on your 
>>>>>> view.
>>>>>> 
>>>>>> Do we keep these typed groupings (or something similar), or do we model 
>>>>>> this as an untyped composite source set that contains a bunch of atomic 
>>>>>> language source sets? There are some problems with the current groupings:
>>>>>> 
>>>>>> - The ANTLR source is currently attached to the 'jvm' group, but ANTLR 
>>>>>> can generate C, C#, and bunch of other languages.
>>>>>> - Java source and JVM byte code can be compiled to native code.
>>>>>> - Groovy, Scala and ANTLR source are added in dynamically via a 
>>>>>> convention object, so don't make use of the typing anyway. It would be 
>>>>>> the same for native languages other than C/C++ as well. So you've got 
>>>>>> this first-and second-class thing going on. It feels like a good 
>>>>>> solution should not treat certain languages specially.
>>>>>> - Sometimes multiple groups of source in a given language make up a 
>>>>>> logical group. Eg API + impl, java 5 + java 6, windows + posix, etc.
>>>>>> 
>>>>>> Let's say we remove the typed groupings (in a backwards compatible way, 
>>>>>> as always). It might work something like this:
>>>>>> 
>>>>>> - Some basic 'language' plugin adds the concept of (composite) source 
>>>>>> sets. You can define source sets and add whichever language source sets 
>>>>>> you like.
>>>>>> - A 'jvm-language' plugin adds a rule that adds a resources source set 
>>>>>> to each source set, adds the concept of JVM packagings, and a rule that 
>>>>>> knows how to copy the resource files that are inputs to a JVM packaging.
>>>>>> - A 'java-language' plugin adds a rule that adds a Java source set to 
>>>>>> each source set, and a rule that knows how to compile the Java source 
>>>>>> that are inputs to a JVM packaging.
>>>>>> - Groovy and Scala language plugins, as above.
>>>>>> - A 'standard-source-sets' plugin adds the 'main' and 'test' source sets.
>>>>>> - The 'java-base' plugin adds a rule that adds a classes dir packaging 
>>>>>> for each source set, and wires up the source set as input to the 
>>>>>> packaging.
>>>>>> - The 'java', 'groovy' and 'scala' plugins just apply various 
>>>>>> combinations of the above plugins.
>>>>>> - The 'android' plugin adds an Android resources source set to each 
>>>>>> source set (these are different to the JVM resource files above), and 
>>>>>> APK packaging, and a rule that knows how to compile the resources and 
>>>>>> classes packaging into an APK packaging.
>>>>>> - A 'native-language' plugin adds the concept of native packagings.
>>>>>> - A 'cpp-language' plugin adds a C/C++ source set, a public headers 
>>>>>> source set and a private headers source set to each source set, and a 
>>>>>> rule that compiles C/C++ source that are inputs to a native packaging.
>>>>>> - The 'cpp-lib' and 'cpp-exe' plugins just apply various combinations of 
>>>>>> the above plugins.
>>>>>> - An 'assembly-language' plugin adds an assembly source set to each 
>>>>>> source set and a rule that compiles assembly source that are inputs to a 
>>>>>> native packaging.
>>>>>> - A 'javascript-language' plugin adds a Javascript source set to each 
>>>>>> source set, and the concept of a Javascript packaging.
>>>>> 
>>>>> Let's tweak this a bit. In the above, adding support for a language also 
>>>>> adds one or more language source sets to every source set, which is isn't 
>>>>> quite right. Some common cases where that doesn't reflect reality:
>>>>> 
>>>>> - A Java project uses Groovy for testing.
>>>>> - A project uses ANTLR to generate production code but not test code.
>>>>> - A Java project bundles a JNI library, so that it has Java and C 
>>>>> production code, but all the tests are written in Java.
>>>>> 
>>>>> It might be better to flip things around and infer the language source 
>>>>> sets based what we need to build, and on convention:
>>>>> 
>>>>> - Each of the JVM -language plugins add rules that can compile the 
>>>>> language source that forms inputs to a JVM packaging. And that can 
>>>>> generate the API docs from the language source that forms inputs to an 
>>>>> API documentation packaging.
>>>>> - Each of the native -language plugins add rules that can compile the 
>>>>> language source that forms inputs to a native packaging.
>>>>> - The antlr-language plugin adds rules that can generate source from 
>>>>> ANTLR source sets that form inputs to a language source set.
>>>>> - An opinionated 'jvm-library' plugin states that for each JVM library 
>>>>> component 'n', there is a jar packaging 'n', which has an input classes 
>>>>> packaging 'n', which has as input a production source set 'n', which 
>>>>> includes a source set for each supported JVM language.
>>>>>   - This plugin may defer adding the language source set until it is 
>>>>> either referenced (to be configured) or the conventional source dir is 
>>>>> not empty.
>>>>>   - This would mean, in turn, that we wouldn't add compile tasks for 
>>>>> languages that are not required to build the classes packaging.
>>>>> - An opinionated 'jvm-unit-tests' plugin that states that each project 
>>>>> has a single 'test' classes packaging, which has as input a test source 
>>>>> set 'test', which includes a source set for each supported JVM language. 
>>>>> Plus adds the appropriate test task.
>>>>> - The java plugin applies the the java-language, jvm-library and 
>>>>> unit-tests plugins, and defines a single 'main' JVM library.
>>>>> - Similarly, an opinionated 'native-component' plugin states that for 
>>>>> each native component 'n', there is a binary packaging 'n', which as as 
>>>>> input object file packaging 'n', which has as input a production source 
>>>>> set 'n', which includes a source set of each supported native language.
>>>>> - An opinionated 'native-unit-tests' plugin, does the same kind of thing 
>>>>> as the 'jvm-unit-tests' plugin.
>>>>> - The cpp-lib and cpp-exe plugins apply the cpp-language, 
>>>>> native-component and native-unit-tests plugins and define a 'main' 
>>>>> library or executable.
>>>>> 
>>>>> In other words:
>>>>> 
>>>>> - The 'capability' plugins add classes of things, and rules to build a 
>>>>> thing from its input things.
>>>>> - The 'opinionated' plugins that add instances of things, and rules that 
>>>>> state what a given thing's inputs are.
>>>> 
>>>> That seems quite right to me.
>>>> 
>>>> Could we build on the mutable polypmorphic container idea for this? 
>>>> 
>>>> Ignore backwards compatibility for a second…
>>>> 
>>>> A project has a…
>>>> 
>>>> interface SourceSetContainer extends 
>>>> NamedDomainObjectCollection<SourceSet> {}
>>>> 
>>>> interface SourceSet extends Named {
>>>>    LanguageSourceSetContainer getLanguages()
>>>> }
>>>> 
>>>> interface LanguageSourceSetContainer extends 
>>>> NamedDomainObjectCollection<SourceSet> {}
>>>> 
>>>> interface LanguageSourceSet extends Named {}
>>>> 
>>>> So…
>>>> 
>>>> sourceSets {
>>>>    main {
>>>>            languages {
>>>>                    java  { }
>>>>            }
>>>>    }
>>>> }
>>>> 
>>>> If that's too much nesting, we could always offer other views, but I think 
>>>> that's the data structure. 
>>> 
>>> This is exactly what I went with in the latest revision of the spec (not 
>>> pushed yet). A SourceSet has-a container of LanguageSourceSet instances of 
>>> various types.
>>> 
>>> I'm not completely happy with the name 'languages' for this container:
>>> 
>>> * 'languages.resources' doesn't feel quite right for the resource source 
>>> files. That is 'resources' is not really a language. And resources might 
>>> include Java source files, and so on.
>>> * We need separate 'languages' for public c++ headers, private c++ headers 
>>> and c++ source files. The distinction is important because each group of 
>>> source needs to be treated separately: headers and source files are passed 
>>> to the compiler in different ways, and public headers need to travel with 
>>> the binaries.
>>> * We need separate 'languages' for source files that are generated and not 
>>> generated. This can combine with the above, so some public c++ headers 
>>> might be generated (from a midl file, say) and some might not. And private 
>>> c++ headers might be generated (using javah, say). This distinction is 
>>> important for static analysis (e.g. don't run check style over generated 
>>> source files), for bundling the source, and for building the IDE model.
>>> 
>>> That is, there are a few different dimensions here, and implementation 
>>> language is one. We can either model each dimension, or we can give each 
>>> group of source a name and use some conventions for the names.
>> 
>> What about …
>> 
>> interface SourceSetContainerContainer extends 
>> NamedDomainObjectCollection<SourceSetContainer> {}
>> interface SourceSetContainer extends NamedDomainObjectCollection<SourceSet>, 
>> Named {}
>> interface SourceSet extends Named {}
>> 
>> So…
>> 
>> sourceSets {
>>      main {
>>              java  { }
>>              resources { }
>>              …
>>      }
>> }
> 
> 
> This would be my preference. The issue here is backwards compatibility: 
> SourceSet.java already exists as a property and is-a SourceDirectorySet. Same 
> with 'resources'. And sourceSets.main.groovy and others are mixed in using a 
> convention object, rather than adding something to a container.
> 
> I guess there are a few options to implement this in a backwards compatible 
> way: have SourceSet.getJava() return a JavaSourceSet and have JavaSourceSet 
> extend SourceDirectorySet, for example. I'd rather that LanguageSourceSet 
> has-a source property of type SourceDirectorySet, rather than is-a 
> SourceDirectorySet.
> 
> Alternatively, we might offer an alternative root namespace:
> 
> source {
>     main {
>         java { … }
>     }
> }
> 
> And have the old namespace backed by the new namespace.

This is quite appealing.

-- 
Luke Daley
Principal Engineer, Gradleware 
http://gradleware.com


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [gradle-dev] producing multiple outputs from jvm languages

Reply via email to